Chapter 5 discusses decision making using system modeling. The author briefly mentions an open source software tool, EMA Workbench, that can perform EMA and ESDMA modeling. Find EMA Workbench online a

Skill Builders

Estimating Population Parameters

Learning Objectives

  • Interpret confidence intervals.

Estimating Population Parameters

Often, researchers want to estimate the value of a population parameter using information they have obtained from their sample. For example, in the political mass media, you will often see estimates of the proportion of voters who plan to vote for a certain candidate based on data collected from samples.

Point Estimates

One way to estimate a population parameter is to use a point estimate. In a point estimate, we estimate the unknown parameter of interest using a single value (hence the name "point estimation"). The point estimate is typically the value of our sample statistic and represents a best guess about the population parameter. As the following example illustrates, this form of inference is quite intuitive.

Example

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ: the mean IQ level of all the students at SU.

A random sample of 100 SU students was chosen, and their (sample) mean IQ level was found to be x¯=115x¯=115

If we wanted to estimate μ - the population mean IQ level - by a single number based on the sample, it would make intuitive sense to use the corresponding quantity in the sample: the sample mean x¯=115x¯=115.

We say that 115 is the point estimate for μ, and in general, we'll always use x¯x¯ as the point estimator for μ. Note that when we talk about the specific value (115), we use the term estimate, and when we talk in general about the statistic x¯x¯, we use the term estimator.  The following figure summarizes this example:

Chapter 5 discusses decision making using system modeling. The author briefly mentions an open source software tool, EMA Workbench, that can perform EMA and ESDMA modeling. Find EMA Workbench online a 1

Point estimation is simple and intuitive, but also a bit problematic. That's because i f you are using truly random samples, the sample estimate is the point estimate and best-guess of the population parameter’s value. Because of the luck of the draw, however, different random samples will be composed of different elements of the population, and the various samples will have different values for the sample statistic. The bottom line for point estimates is that with random samples, you can be fairly certain the estimate is close to the population parameter, but the value of the sample statistic provides no indication of how close the sample value is to the population parameter. While point estimates are useful as best guesses, they do not tell us about how far off they may be.

In other words, when we estimate, for instance, μ by the sample mean x¯x¯ , we are almost guaranteed to make some kind of error. Even though we know that the values of x¯x¯ fall around μ, it is very unlikely that the value of  x¯x¯ will fall exactly at μ. The difference between  x¯x¯ and μ  is called sampling error.

Learn by Doing

"Fill in the blank" question: select the correct answer.

Prior to an election, pollsters attempt to predict the proportion of voters in the voting population who will actually vote in the election. They collect a random sample of likely voters and calculate the proportion of voters who say they will vote in the election. The Hint, displayed below-Select- proportion is considered a point estimate of the Hint, displayed below-Select- proportion. The pollster does not expect the point estimate to exactly equal the population proportion because of the luck of the draw. The difference between the sample proportion and population proportion is called Hint, displayed below-Select- error.

A large state university conducted a study in order to estimate μ, the mean cost of textbooks per semester of a student in the university. A stratified random sample of 530 students was chosen (ensuring representation from the different majors and different classes), and it was found that the total amount spent by these students on textbooks was $225,250. Based on the results of this study, what is the point estimate for μ?

$530

There is not enough information since μ is not given.

$225,250

$425

Interval Estimates

Given that such errors are a fact of life for point estimates (by the mere fact that we are basing our estimate on one sample that is a small fraction of the population), these estimates are in themselves of limited usefulness, unless we are able to quantify the extent of the estimation error. Interval estimates address this issue. The idea behind interval estimation is, therefore, to enhance the simple point estimates by supplying information about the size of the error attached. For an interval estimate, we try to determine the possible range of values that are likely to include the population parameter (the mean, for example). In particular, we often use confidence intervals in the social sciences to specify the likelihood that the population parameter is contained within a specified range. An example of a statement you might make that is based on a confidence interval is, “I am 95% confident that the population proportion of voters in favor of my candidate is between .51 and .59.” Confidence intervals can be calculated for a variety of parameters, but in this skill builder, we will primarily focus on calculating confidence intervals for means.

Example

Consider the example we discussed in the point estimation section:

We used x¯=115x¯=115 as the point estimate for μ. (That is, we used the sample mean of 115 to estimate the population mean.) However, we had no idea of what the estimation error involved in such an estimation might be. Interval estimation takes point estimation a step further and says something like:

"I am 95% confident that by using the point estimate  x¯=115x¯=115 to estimate μ, I am off by no more than 3 IQ points. In other words, I am 95% confident that μ is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3)."

Yet another way to say the same thing is: "I am 95% confident that μ is somewhere in (or covered by) the interval (112,118)."

Note that while point estimation provides just one number as an estimate for μ (115), interval estimation provides a whole interval of "plausible values" for μ (between 112 and 118), and also attaches the level of our confidence that this interval indeed includes the value of μ to our estimation (in our example, 95% confidence). The interval (112,118) is therefore called "a 95% confidence interval for μ."

MODULE 10 SAMPLING DISTRIBUTIONS AS A BASIS FOR STATISTICAL INFERENCE

Page navigation

  • previous: Confidence Intervals Summary

  • next: Sample and Population Distributions

  • Go to page 

Current Module | Pages 42 - 45

Sampling Distributions as a Basis for Statistical Inference

Learning Objectives

  • Describe key characteristics of a sampling distribution.

Recall that a goal of quantitative research is to make generalizations about a population using the information that has been obtained from a sample. In other words, researchers use sample statistics to draw conclusions about population parameters.

For example, Afrobarometer (http://www.afrobarometer.org/) is from a network of researchers who collect data from over 35 countries in Africa using public attitude surveys on democracy, governance, economic conditions, and related issues. In the Afrobarometer surveys, researchers work to make generalizations about people in a specific country in Africa, or all of Africa, based on a subset of the population. They use statistics to estimate parameters (e.g., the average response to an item on a survey) for certain African countries or for Africa as a whole. To use another example, let’s say an operations manager is concerned about the life of all of the light bulbs being produced at her facility. She views all of the potential items a process may produce as a population (i.e., the population of light bulbs) and may use statistical analysis on a subset of light bulbs (i.e., a sample) to draw conclusions about all of the light bulbs.

An important theoretical characteristic of a sample is that an element included in the sample has been chosen by the luck of the draw from the population. Different samples from the same population are therefore expected to have different elements and will therefore generate different statistics (e.g., different sample means). This means that the statistics that you obtain based on your data are likely to differ from the actual population parameter; that is, you will have sampling error in your study. For example, if you have collected data on a sample of 50 college students and have obtained a mean university satisfaction score for the students, the mean that you observe in your sample is likely to differ from the actual, true population mean. Therefore, even though you will typically draw just one sample (size n) when conducting your research, you need to know about the distribution of all possible values for the statistic. Knowing this can help you to understand how close your observed sample statistic is to the “true” population parameter and can help you to make inferences from your sample to the population.

Sampling Distribution

sampling distribution tells you about the distribution of values for a statistic. That is, if you draw many, many random samples of size from the population, the distribution of the values for the statistic is the sampling distribution. Recall that simple random sampling means that each unit in the population has an equal chance of being selected into the sample. This type of sampling allows us to create sampling distributions and to understand how our sample might differ from the population. (Even though samples are rarely truly random, statistical models are typically based on the assumption that a random sample has been collected.) 

All statistics (e.g., standard deviations, variances, correlation coefficients) have sampling distributions. Nevertheless, the one most often studied is the sampling distribution of the sample meanX¯¯¯X¯. The mean is used most often because researchers frequently want to know, “what is the value of a variable for a typical case?” Statisticians have developed many models using the mean.

The aim of this skill builder is to show how sample information is related to the population. In particular, the relationship between the sample mean, X¯¯¯X¯, and the population mean, µ, will be examined. Note that this skill builder discusses examples in which it is presumed that the population is quite large. An assumption has also been made that the value of σ, the population standard deviation, is known exactly. (In practice, σ is often estimated and is not known exactly.)

Learn by Doing

"Fill in the blank" question: select the correct answer.

Read the following passage from the Afrobarometer website and then complete the fill-in-the-blank questions using the pull down menu.

Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. We achieve this by:

  • using random selection methods at every stage of sampling;

  • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.

The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.

http://www.afrobarometer.org/surveys-and-methods/sampling-principles

According to the passage, a Hint, displayed below-Select-would be all of the people living in an African country who meet the criteria included in the passage. A 20 year old student living in a dormitory Hint, displayed below-Select-be included in the population. Every element of the Hint, displayed below-Select-would also be an element of the Hint, displayed below-Select-.

Learn by Doing

Hints, displayed below

Indicate whether each statement is a true or false statement about simple random sampling and the samples that result.

Table of multiple choice questions

True of random sampling

False statement about random sampling

If a population consists of one million elements, each element must have the same chance of appearing in a random sample of n = 100.

If a random sample with n=50 is collected, it can be used to calculate the value of µ, the population mean.

In practice, collecting a random sample can be difficult.

If two random samples are drawn from the same population, they will contain the same elements.

A random sample will result in a value for a statistic.

If a random sample with n=50 is collected, it can be used to calculate a value of the sample mean.