Discussion Question

Week 5 Lecture 13

This week we look at two different approaches to analyzing data and making inferences

about the populations they come from. The first is confidence intervals , a range of values that

we expect to contain the actual population mean based on the sample results we obtained. The

other is a way to use nominal and ordinal data in a statistical analysis. The Chi Square family of

tests looks at patterns within sample s and sees whether the underlying populations could contain

the same pattern of measure distributions (Lind, Marchel, & Wathen, 2008).

Confidence Intervals

When we perform a t -test or ANOVA, we are using a single point estimate for the means

of the populat ions we are testing. Some professionals and managers are a bit uncomfortable with

this; they understand that the sample has a sampling error – and the actual population mean could

be – and most likely is – a bit different. They are interested in getting an estimate of what the

sampling error is and how much the population mean could differ from the sample mean.

We deal with this through the use of confidence intervals, a range of values that have a

specific probability of containing the actual population mean. We have seen one example of a

confidence mean already , the intervals used to determine which population means varied when

we rejected the null hypothesis for the ANOVA test were confidence intervals.

Confidence intervals often provide the added information and comfort about estimates of

population parameter values that the single point estimates lack. Since the one thing we do know

about a statistic generated from a sample is that it will not exactly equal the population

parameter, we can use a conf idence interval to get a better feel for the range of values that might

be the actual population parameter. They also give us an indication of how much variation exists

in the data set . The larger the range (at the same confidence level), the more variati on within the

sample data set and the less representative the mean would be (Lind, Marchel, & Wathen,

2008). We are going to look at two different kinds of confidence intervals this week – intervals

for a one sample mean and intervals for the differences between the means of two samples (Lind,

Marchel, & Wathen, 2008).

One Sample Confidence Interval for the mean

A confidence interval is simply a range of values that could contain the actual population

parameter of interest. It is centered on the sample m ean, and uses the variation in the sample to

estimate a range of possible values (Lind, Marchel, & Wathen, 2008). To construct a confidence

interval, we use several pieces of information from the sample and the confidence level we

want.

From the sample we use the mean, standard deviation, and size. To get the confidence

level – a desired probability ( usually set at 95%), that the interval does, in fact, contain the

population mean.

Example. The confidence interval for the female mean salary in the popul ation would be

calculated this way. The sample mean value is 38, the standard deviation is 18., and the sample size is 25 3 (from Week 1 material). Once we determine the confidence level we want, we use

the associated 2 -tail t value to achieve it. The t -value is found with the fx function t.inv.2t (Prob,

df). For a 95% confidence interval, we would use t.inv.2t(0.05, 24), this equals 2.064 (rounded).

We now have all the information we need to construct a 95% confidence interval for the

female salary mean:

CI = mean +/ - t * stdev/sqrt(sample size) = 38 +/ - 2.064*18.3/sqrt(25) = 38 +/ - 7.6.

This is typically written as 30.4 to 45.6. Not e: the standard deviation divided by the square root

of the sample size is called the standard error of the mean, and is the variation measure of the

sample used in several statistical tests, including the t- test and confidence intervals.

The associated 95 % CI for males is 44.6 to 59.3. Note that the endpoints overlap – male

smallest vale is 44.6 while the female largest value is 45.6. This suggests that both population

average salaries could be the same and around 45. However, just as the two one-sample t-tests

gave us misleading information on possible equality, using two confidence intervals to compare

two populations also is not the best approach.

The Confidence Interval for mean differences.

When comparing multiple samples, it is always best to use all the possible information in

a single test or procedure. The same is true for confidence intervals. If we are interested in

seeing if sample means could be equal, we look to see if the difference between the averages

could be 0 or not. If so, then the means could be the same; if not, then the means must be

significantly different.

The formula for the mean difference confidence interval is mean difference +/ - t*standard

error. The standard error for the difference of two populations is found by adding the

variance/sample size (which is the standard error squared) for each and taking the square root

(Lind, Marchel, & Wathen, 2008). For our salary data set we have the following values:

Female mean = 38 Male mean = 52 t = t.inv.2t(0.05, 48) = 2.106

Femal e Stdev = 18.3 Maler Stdev = 17.8 Sample size = 50, df = 48

Standard error = sqrt(Variance (female)/25 + Variance (male)/25) =

Sqrt(334.7/25 + 316/25) = 5.10.

This gives us a 95% confidence interval for the difference equaling:

(52 -38) +/ - 2.106 * 5.10 = 14 +/ - 10.7 = 3.3 to 24.7. Since this confidence interval does not contain 0, we are 95% confident that the male and female

salary means are not equal – which is the same result we got from our 2 sample t-test in week 2.

We also now have a sense of how much variation exists in our measures.

Side note: The “+/ - t* SE” term is often called the margin of error . We most often hear

this phrase in conjunction with opinion polls – particularly political polls, “candidate A has 43%

approval rating with a margin of error of 3.5%. While we do not deal with proportions in the

class, they are calculated the same as an empirical probability – number of positive replies

divided by the sample size. The construction of these margins or confidences is conceptually the

sa me – a t -value and a standard error of the proportion based on the sample size and results

(Lind, Marchel, & Wathen, 2008). References

Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw -Hill Irwin.