Statistics Question

Week 4 Lecture 10 We have been examining the question of equal pay for equal work for several weeks now; but have been somewhat frustrated with the equal work part. We suspect that salary varies with grade level, so that equal work is not done if we compare salaries across grades. We found that we could control the effect of grades with either of two techniques. The first is by choosing a variable that does not include grade level variation such as compa -ratios (the salary divided by midpoint). The second by statisticall y removing the impact of grade level using the ANOVA Two -factor without replication. Both of these gave us different outcomes on the question of male and female pay equality than examining salary only. However, we still have not gotten a “clean” measure o f equal work as there are still other factors that may impact work done such as performance levels (measured by the performance appraisal rating), seniority, education, etc. And, there could be gender bias (and, for real world companies, ethnic bias as well. W e will not cover this, but it can be dealt with the same way as we will examine gender). We need to find a way to eliminate the impact of these variables on our pay measure as well. This week we will look at two techniques that are very good at exami ning and explaining the influence of variables on outcomes. These are correlation and regression techniques.

Linear Correlation Correlation is a measure of how variables/things relate – that is, if one variable changes does another variable change in a pr edictable pattern as well? One very well -known example is the correlation (or relationship) between length/height of children and weight. As children become longer/taller their weight also increases (Tanner & Youssef-Morgan, 2013). Using this relationshi p, we can make predictions (using the technique of regression discussed in Lecture 11 for this week) about how heavy a child should be for any given height. For variables that are at least interval in nature, two types of correlation exist for a bi - variabl e (two variables only) relationship – linear and curvilinear. As they sound, linear correlations show the extent to which the data variables move in a straight line. Curvilinear correlations – which we will not cover – show the extent that variables move in curved lines. Scatter Diagrams An effective way to see if the data do relate in predictable ways involves generating a scatter diagram (AKA scatter chart) – a visual display of how the data points – (variable 1 value, corresponding variable 2 value) re late together (Lind, Marchel, & Wathen, 2008). Example1. One relationship we might expect to show a positive (both values increasing) relationship would be salary and performance rating , either for the entire salary range or at least within grades. The f ollowing scatter diagram (made with the Excel Insert Graph functions) show the relationship with Performance Rating on the bottom and Salary on the on the vertical axis . It shows if we put a straight line through the data points , there is a very modest increase from the lower left to upper right. Salary (Y -axis) and Performance Rating (X -axis) Example2. If we look at the same variables, but include Grade as a factor, we get the second graph (below) and see the data separated by grade. Each grade seems to show ( again, if we were to put a straight line thru the data points for each grade ) level lines, indicating no correlation at all. Neither graph gives us much hope that Performance Rating is related to Salary , something HR would probably not be happy with.

Salary Grades (Y -axis) and Performance Appraisal Rating (X -axis) Correlation We will be focusing our efforts on the Pearson Correlation Coefficient – a mathematical value that shows the strength of the linear (straight line) relationship between two variables (Lind, Marchel, & Wathen, 2008). The math formula is a bit tedious, so we will not bother with it – but, if interested, you can ask Excel to display it (either with Help or the “Tell me what you want to do.” With the latte r, I typed show help on Pearson Correlation, and then selected the “show help…” line , getting a description and the math formula.). Pearson correlation ranges from a value of - 1.00 to a +1.00. Any value outside of this range indicates an error in the m ath or setup. A perfect negative correlation ( -1.00) means that the data points all fit exactly on a line that runs from the upper left corner to the lower right on a graph, a negative slope. A perfect positive correlation (+1.00) has the line with a pos itive slope and runs from the lower left to the upper right (Tanner & Youssef-Morgan, 2013).

As the values move away from the perfect extremes, the data points move away from a line to a spread around the line. If we look at our first graph above , the overall Salary and Performance Rating relationship, we have a correlation of +.15, considered very low and not particularly impressive. Pearson Correlation. Excel finds the Pearson Correlation Coefficient using either the fx function Correl or the Data Analysis function Correlation. The former is used for a single data set with two variables, while the latter can be used for a single or multiple data sets. The Correl output for the Performance Rating and Salary correlation result is: Column 1 Column 2 Column 1 1 Column 2 0.151307 1 Note the variable names are not included, and we have three correlations. Two will always show a perfect +1.00 correlation of column 1 with column 1 and column 2 with column 2; a diagonal convention makes more sense with the Correlation table we will look at below. The third correlation is the column 1 with column 2 variable. It does not matter which variable is considered in column 1 or 2, as the result will be the same as switching the variable columns. We can use the Correlation function to identify correlations between multiple data sets at the same time , much as Descriptive Statistics could work with multiple variables at once. In trying to identify what variables might be impacting Salary , we could generate the following table. Remember, that Pearson’s Correlation requires at least interval level data, so that not all of our variables are used. In addition, since Salary and Compa -ratio are two measures of the same thing ( pay) we do not want to include t hem in the same table. Sal Mid Age Perf Rat Service Raise Sal 1.000 Mid 0.989 1.000 Age 0.544 0.567 1.000 Perf Rat 0.151 0.192 0.139 1.000 Service 0.452 0.471 0.565 0.226 1.000 Raise -0.041 -0.029 -0.180 0.674 0.103 1.000 To identify all of the correlations for a single variable, find the name in the left column. Then go across until you reach the 1.00 value, then go down. For age, we find that the correlation with:

Age = 0.544, Mid = 0.567, Age (itself) = 1.00, Perf Rat = 0.139, Service = 0.565, and Raise = - 0.180.

Side note: now we can see why the correlation with itself is shown in the tables, it provides the pivot point for reading the table outcomes. The values above this diagonal of 1.00 values would be identical to those below, so they are not provided to make the table visually easier to read. Coefficient of Determination. We will look at determining statistical significance of correlations in lecture three for this week. But, in the meantime, we can consider the Coefficient of Determination as a rough measure of usefulness (we will look at the effect size measure in lecture three as well). The coefficient of determination is the square of the correlation, and represents the percent of variation that the variables share in common; that is, the amount of variation in one variable’s changes that is explained by the variation in the other variable. So, for age and salary, the coefficient equals 0.544 2 = .30 (rounded). As a rule of thumb, variable pairs with coefficients less than (<) 70% are generally not very valuable for prediction purposes. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw -Hill Irwin. Tanner, D. E. & Youssef -Morgan, C. M. (2013). Statistics for Managers. San Diego, CA:

Bridgeport Education.