Must be skilled in statistics.

Week 3 Lecture 7 We have so far seen how we can summarize data sets using descriptive statistics, showing several characteristics including mean and standard deviation. We also found that if our data comes from a random sample of a larger population, these descriptive statistics become inferential statistics, and can be used to make inferences about the population. These inferences can then be used in statistical tests to see if things have changed or not (equal to known standards or other data sets o r not).

We have looked at one and two sample mean tests (with the t -test) and two sample comparisons of variance equality (with the F test). This week we will look at the Analysis of Variance (ANOVA) test for mean equality between three or more groups. AN OVA The first question often asked is why not just do multiple t- tests comparing three or more different group means? One answer involves efficiency. Conducting multiple t -tests can become somewhat tedious. Comparing just three groups (A, B, and C) requires us to compare A and B, B and C, and A and C ( 3 tests). With 4 groups (A, B, C, and D) we have A and B, A and C, A and D, B and C, B and D, C and D ( 6 tests)! So a single test can save us a lot of time and is much more efficient. A second reason and much more important reason is that we lose confidence in our results when multiple tests are performed on the same data. With an alpha of 0.05, we are 95% certain we are right with each test, but being certain we are right for all the tests involves multiplying the results together, so for three tests we would be .95*.95*.95 or 86% certain; with six tests, our confidence drops to .95^6 = .74, a long way from our desired 95% confidence. So, a single test maintains our desired level of confidence in the out come (Lind, Marchel, & Wathen, 2008). Logic A second question asked come s from the name itself , how can analyzing variance tell us anything about mean differences? The answer lies in how ANOVA works. The key assumptions for an ANOVA analysis are that each of the groups are normally distributed AND have equal variances. These mean that the distributions are shaped the same and, this allows for an easy comparison. Take a look at the following two sets of normal curves. Exhibit A Exhibit B The means of the three sample groups in Exhibit A could clearly come from three populations that have the same mean, and the differences seen are merely sampling errors. However, we cannot say the same thing about the sample groups in Exhibit B.

ANOVA tak es the variation of all of the data in the groups being tested (three in this case) and compares it with the average variation for each of the groups using the F -test (discussed last week). Since for the Exhibit A groups, the overall variation will be onl y slightly larger than the average of the three (which are assumed to be equal). Since the resulting F value 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -5 -4-3-2-1 012345 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -10 -50510 will not be statistically significant, we can say that the groups are closely distributed and the means are statistically equal.

In Exhibit B, how ever, the variation of the entire group would be around three times the variation of the average. Just by looking at the average variance for the individual groups and comparing it to the variance for the entire group, we can make a judgement on how close the distributions are, and with that a judgement on mean equality. As with the t- test, ANOVA will let us know exactly how much difference in the population locations is enough to say means differ or not , we cannot just “eyeball” it. Hypothesis Stating the null and alternate hypothesis for an ANOVA test is simple, as they are always the same: Ho: All means equal. Ha: At least one mean differs (Tanner & Youssef-Morgan, 2013).

You might recall from last week that we said the alternate always states the opposite from the null statement. If so, why isn’t our alternate: all means differ, which seems like the opposite? The reason is that the ANOVA test will reject the null hypothesis if even one mean from the groups being examined is statistically signific ant difference. So, the opposite of all means differ is actual at least one mean differs. Data Set -up Setting -up the data for an ANOVA analysis is just a bit more complicated than for a t - test. While with the T -test we just highlighted the column or portion of a column of data (sometimes after sorting it by a variable such as gender), for an ANOVA test, we need to create a table. For example, if we wanted to look at average salaries per grade (shown in the Week 3 Lecture 8 example), we would need a table looking like this. Doing this is fairly simple. Copy the grade and salary columns (separately) and paste them onto a new Excel sheet (probably in Week 3 to the right of the questions). Then, highlight both columns – from labels to last value – and sel ect Data Sort . Select sorting on the grade variable and click on OK. Both columns are now in grade order, and you can highlight and cut the salaries for each grade and paste them into a new table you create with the grade letter as the head. When finished, you will have the input table used in setting up an Excel ANOVA test. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw -Hill Irwin. Tanner, D. E. & Youssef -Morgan, C. M. (2013). Statistics for Managers. San Diego, CA:

Bridgeport Education.

Week 3 Lecture 8 Excel ANOVA Example In our on- going investigation of whether or not males and females are paid equally for equal work, we have come up with contradicting results so far , average salaries are clearly different but average compa- ratios are not. We need to examine reasons that might impact these differences to see if we can explain what is going on. For possible factors influencing individual salaries, we need to be able to, paraphrasing what they say in TV cop shows, “rule it out as a suspect ” in causing differences or keep it in as a cause of differences between the gender pay practices. One key issue in our question that has not clearly been examined yet is the impact of grades on salaries. Clearly, grade differences have the potential to complicate the issue as the work done differs by grade. One question to ask here is, “are average salaries equal across grade levels?” This becomes our research question. Example For the research question of: are average salaries equal acro ss the grades, we have the following hypothesis test.

Step 1: Ho: All salary means are equal. Ha: At least one mean differs. Step 2: Reject the null if the p -value < alpha = .05. Step 3: Statistical Test: Single -factor ANOVA. (Note: salary variance in so me of the grades may violate the equal variance requirement. We will ignore this for the purposes of this example.) Step 4: Perform Test. The input box for Excel’s Single factor ANOVA is The input range for this example would be D1:F16; we would clic k on Labels in the first row, and select any output range desired (This would be given in the assignment for consistency’s sake). Completing the input screen and clicking OK gives us an output table.

Reading the ANOVA output tables The first thing we se e is the test name in cell K -1: Anova: Single Factor. This is just a check to ensure we have the right test. Next we see a summary table. Under the Groups column we should see the data labels ( in this case our grades). If not, and we see something such as a number, an input error has been made , the labels were not included but the Labels box was checked. If this happens, just redo the data set up and overwrite the output. For each variable, we see the count, sum, average, and variance. If we had some question about having equal variance, we could perform an F -test on the variables with the extreme values. (Again, for purposed of this example, we are going to ignore the requirement for equal variances.) The next table is the ANOVA output. While, technically for our hypothesis test, we only need to look at the p- value result, the other columns provide some useful information. Note: this is somewhat technical, and is presented only as an explanation of the table. The source of variation column gives us our two variation measures; Between groups refers to the overall variation while Within Groups refers to the average variation for all the groups. The SS column ( Sum of Squares ) is an estimate of the variation (slightly different than our variance formul a). This value is divided by the df (degrees of freedom) value for each group. This df is conceptually the same as that discussed with the t -test; and the total df is N -1, where N is the number of data points. Looking at this value (49 in this example) c onfirms we entered the right number of data points of 50.

MS stands for Mean Square and is the SS divided by the df. The F value is determined by dividing the MS for the between row by the MS of the Within groups row . The p-value and the critical F statistic complete the table. Step 5: Conclusion and Interpretation: The F is much larger than the F critical, and the p- value is much less than 0.05 (Note: 1.04E -35 means move the decimal point 35 places to the left (0 .0000000000000000000000000000000000104). If the E (for exponent) had been positive, we would have moved the decimal to the right, example 1.04E4 = 10400.) So, according to our decision rule, since the p- value is < (less than) 0.05, we reject the null hypothesis and conclude that at least one mean differs. This suggests that grade level has an impact on salary, and that measuring pay in salary terms could be creating some issues in answering our questions.

Determining Differences When we reject the null hy pothesis, a logical follow-up question is often, which differences are meaningful? There are several approaches to answering this question; all involve a pair by pair comparison, and most require access to statistical tables not available within Excel. One approach that we can use in our Excel worksheet involves developing confidence intervals around the difference in group means. (Note: Confidence intervals allow us to develop a range that contains the value we are looking for with a known level of confidence such as 95%. We will discuss this again in Week 5.) All of the required information for these intervals is available from the ANVOA output. The basic approach is to 1. Find the difference between each pair of means 2. To this value, add and subtract a measure of the variation in the data (due to sample error, we know our sample means are not exactly equal to the population parameter, so we need to take this sample error into account , our real difference might be a bit larger or smaller than the samples show). 3. Examine the ranges to see if 0 is included (alternately, do the endpoints have different signs a + and - ); if so the real population difference could be 0 and the means do not significantly differ. The formula for the interval that we will build in Excel is: (mean1 – mean2) +/ - t*sqrt(MSW * (1/n1 + 1/n2)) (Lind, Marchel, & Wathen, 2008) .

Here is an example of how we work out the formula, and what each term means. The value of the means for each variable is found in the Summary table, as is the count (n) for each variable. The MSW is the MS for within that is found in the ANOVA table, and we find t with the t.inv function from Excel.

So, let’s walk thru constructing an interval for grades A and B, and then we can look at what it might look like in an Excel spreadsheet. From our example output above, we have: Mean A = 23.5 (rounded) Mean B = 31.7 (rounded) n for A = 15 n for B = 7 MSW = 8.64 (rounded) T has a df equal to that of MSW (44 in this case), and the probability is our 0.05 for a 95% interval . T.inv(0.05, 44) equals 2.015 (rounded). So, for grades A and B, our mean difference = 31.7 – 23.5 = 8.2 The +/ - term is t * sqrt(MSW * (1/n1 +1/n2)). Plugging in our values gives us 2.015* sqrt(8.64 * (1/15 + 1/7) = 2.71. So, our interval is 8.2 +/ - 2.71 = 5.49 to 10.91 ( rounded).

Since 0 is not in this range, we can say that the mean salaries for grades A and B differ significantly. Setting this up in Excel (using cell references as the examples on the left show) give us the following: So, all of the grade average salary differences are significantly different from each other. Grade is definitely a factor in an employee’s salary, and introduces a source of variation that is not an equal work measure. We have not yet found an answer to our question, as we have not yet figured out how to get a measure of equal work to base our comparisons on.

More to follow next week.

References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & F inance. (13th Ed.) Boston: McGraw -Hill Irwin. Week 3 Lecture 9 Effect Size When we reject the null hypothesis with an ANOVA test, we have two questions that arise. The first, which pair of means differs significantly, we have dealt with already. The second question, similar to what we asked with the t- test null hypothesis rejection is: what caused the rejection, the sample size, or the variable interactions? This question is again answered using an effect size measure.

Recall that the effect size measure shows ho w likely the variable interaction caused the null hypothesis rejection . L arge values lead us to say the variables caused the outcome, while small values lead us to say the outcome has little to no practical significance as the sample size was the most likely cause of the rejection of the null.

With the single factor ANOVA, the effect size measure is eta squared, and equals the SS(between)/SS(total) (Tanner & Youssef -Morgan, 2013). For our salary example in Lecture 8, eta squared equals 17686.02 (SS (between)) / 18066 (SS(total) ) = 0.979 (rounded). Eta squared effect size measures have different interpretation values than Cohen’s d (from the t - test). According to Nandy (2012), a small eta squared effect size has a value of 0.01, a medium of 0.06, and a large value of 0.14 or more. This means we have a large effect size, and the variables of salary and grade interaction are the most likely cause of our rejecting the null hypothesis rather than the sample size. Side note: Eta squared can also be interpreted as the percent of “differences between group scores that can be explained by the independent variable” (Tanner & Youssef -Morgan, 2013, p. 123). This is consistent with our saying the variable interactions caused the outcome.

Different Forms of ANOVA Just as the t -test has several forms, so does the ANOVA test. Excel has three versions available. While we will focus only on the single factor test, a brief description of the other two versions will be presented.

ANOVA: Two factor without replication The ANOVA – two factor without replication tests mean differences from two different variables at the same time. If we are interested in knowing if the mean salary differs by grade and also by gender, we can perform one two-factor test rather than two separate tests. As mentioned in lecture two for this week, this is more efficient and maintains our desired alpha significance level. Excel Example. To test the mean salaries by grade and gender at the same time, we would set up our hypothesis test as follows.

Step 1: Ho1: All salary means are equal across grades. Ha1: At least one mean differs. Ho2: All gender (male and female) means are equal. Ha2: At least one mean differs. Note that in this test, we need to have a hypothesis statement pair for each variable being tested. Step 2: ANOVA: Two sample without replication. Step 3: Reject the null hypothesis if the p -value is < alpha = .05. Step 4: Perform the test. While the input screen for this test is identical to that of the one factor test, the data table used is a bit different. As seen below, it has one value for each variable pair cell. Since we have multiple values for each variable pair, this table was set up with the mean values for each group. A B C D E F Male 24.3 27.7 43.3 48.0 61.7 75.3 Female 23.3 34.8 41.5 52.5 67.0 76.0 The data entry box would include the entire table, labels and all. The output for this test is: Step 5: Conclusions and Interpretation. As with the single factor ANOVA, we start out with a summary table for each variable showing the sum, average, and variance for each variable label. The ANOVA table has an extra row, and one renamed row. The Error row is what we knew as the Within row in the single factor ANOVA. The two rows dedicated to the data are Rows and Columns; these refer to how the variables are presented in the data input table.

The row line refers to our gender variable, since that is the row variable in the input. The p- value is 0.16 (rounded), so we do not reject the null hypothesis of equal means.

The Column line refers to the grade variable, as that was listed in the column position. This p -value is 3.76E -05, or 0.0000376. This is less than (<) our alpha of .05, so we reject the null hypothesis of equal salary means in each grade. We can find which pair(s) of means differ using the same technique as with the single factor ANOVA discussed in Lecture 8. The effect size measure for a Two -factor ANOVA without replication is generally the same as with the single factor ANOVA. For each variable it would be eta squared = SS(for variable) divided by the SS(total) value (Tanner & Youssef -Morgan, 2013). The effect size for our rejected null hypothesis is 3865.341/3917.059 = .987 (rounded), a very large effect – meaning the variable interaction caused the rejection of the null, and we have significant practical outcome; one we can make decisions with.

But, let’s go back to the other result , the failure to reject the null hypothesis claiming that the male and female average salaries are equal. What goes with this outcome? We have clear evidence from t -test done in Week 2 that the average salaries are not equal. This brings us to the other reason for using this test. This is to reduce one cause of error or varia tion in the measurement of a variable . For example, if we think that grade level may be a cause of differences in the salaries by grade (a reasonable ass umption), then we can remove their impact by using this approach. It will take the grade variation out of the overall analysis of salary and include it only in the grade results. What does this mean? We have been concerned that we have not been abl e to measure salary for “equal work ,” this approach does this for us. The salary average difference examined in this test has the impact of grade level differences removed , in essence, the salary that is analyzed is the salary impact of gender if every one did “equal work” (at least as far as job duties ). There is still some questions around the impact of performance ratings, education, seniority, etc. But for now, we have a better view of “equal pay for equal work” salary differences. It appears that perhaps males and females are being paid equally for equal work, on average. Ah, the power of statistics to make things clearer.  ANOVA: Two -factor with Replication (AKA Factorial ANOVA) This form of the ANOVA test is somewhat differen t than the previous two forms. While it can test for mean equality (or differences), this is not its primary purpose. The main purpose is to look at the impact of interaction between variables – that is do the results show different patterns when graphed? Interaction means the variables react differently at different measurement levels (Lind, Marchel, & Wathen, 2008). An example is water and temperature, at cold temperatures water is a solid, at mid -range temperatures it is a liquid, at high temperatur es it is a gas; there is a clear interaction going on. As with the without replication test, an example will help demonstrate this test. We will continue with our gender and grade impact on salary. While our primary research question will be if an interaction between gender and grades impacts salary, we will also repeat our questions about mean salary differences by gender and grade.

Excel Example. To test the mean salaries by grade and gender at the same time, we would set up our hypothesis test as foll ows.

Step 1: Ho1: All salary means are equal across grades. Ha1: At least one mean differs. Ho2: All gender (male and female) means are equal. Ha2: At least one mean differs. Ho3: The interaction impact is not significant. Ha3: The interaction is sig nificant.

Note that in this test, we need to have a hypothesis statement pair for each variable being tested, as well as the interaction. Step 2: ANOVA: Two sample with replication. Step 3: Reject the null hypothesis if the p -value is < alpha = .05. Step 4 : Perform the test. The input screen for this test is similar to that of the other ANVOA forms , it asks for the number of rows for each variable, which seen below would be two . The data table used is a bit different , as seen below, it has multip le values for each cell. Since several grades have only two males or females, we can only use two values in each cell in our table. If your data has more counts per cell, you can include more values. The data entry table was set up with the minimum a nd maximum salary values for each cell. A B C D E F Male 24.0 27.0 40.0 47.0 62.0 72.0 25.0 28.0 47.0 49.0 66.0 77.0 Female 22.0 34.0 41.0 50.0 65.0 75.0 24.0 36.0 42.0 55.0 69.0 77.0 The data entry box would include the entire table, labels and all. The output for this test is: Step 5: Conclusions and Interpretation. As with the other ANOVA forms, we start out with a summary of the variables. In the ANOVA table itself, we have added another row, this one for interaction . Whereas with the no replication output table we started with rows, we did not have sample , both refer to the variable listed in the input table row – gender in our case. We again do not reject our gender null hypothesis as the p-value is 0.055, greater than our p -value of .05. This test also found that gender average salaries did not significantly differ.

The column, or grade, null hypothesis was rejected with a p- value of 1.07E-11 or 0.0000000000107, which is less than (<) .05. So, our grade salary averages do di ffer by grade.

The interaction null hypothesis is also rejected with a p -value of o.135 (rounded), meaning that the salaries do not show a differing pattern by gender -grade groupings . Males and females are treated consistently through the gra des, essentially growing average salaries for each grade jump.

The effect size, eta squared, is done the same way as before – SS for the variable divided by SStotal. The calculation of differences is also done the same way – using SSwithin in the calculations. The various ANOVA formats can provide us with a lot of information that is hidden in other tests. This is one reason why single variable statistical tests that cannot separate out distinct sources of variation within our data measurements often do not pr ovide a complete understanding of the meaning hidden within the measures. More on this in the upcoming weeks. References Lakens , D. (Nov. 2013). Calculating and reporting effect sizes to facilitate cumulative science:

A practical primer for t -tests and ANOVAs. Frontiers in Psychology. 4:863. doi:

10.3389fpsyg2013.00863. Retrieved from http://journal.frontiersin.org/article/10. 3389/fpsyg.2013.00863/full Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw -Hill Irwin. Nandy, K. (2012). Understanding and Quantifying EFFECT SIZES. Retrieved from http://nursing.ucla.edu/workfiles/research/Effect%20Size%204- 9-2012.pdf Tanner, D. E. & Youssef -Morgan, C. M. (2013). Statistics for Managers. San D iego, CA:

Bridgepoint Educ ation.