Unit VII EssayIn an essay of no less than three pages, explain the correlation of the data points to the equations shown in Figure 4.3 on page 119 (How can better data be acquired?), and review Figure

MSL 5080, Methods of Analysis for Business Operations 1 Cou rse Learning Outcomes for Unit VII Upon completion of this unit, students should be able to: 7. Assess the differences between correlation and causation . 7.1 Explain the correlation of data points to a given equation. 7.2 Determine assumptions of the regression model. Reading Assignment Chapter 4: Regression Models, pp. 114 –123 Unit Lesson Regression Models In business and government , situations occur that lead observers to wonder “is the change in X related to the change in Y?” Indeed, they may see Y change every time X does and already conclude that they are correlated but just cannot tell how much without analysis. It follows that after seeing something affect a change in something else in a field of one’s concern, the next question that arises is “how much will Y change when X changes?” Data that can be grouped into a sloping line can lead to a mathematical answer to that question, as all lines on a graph have a slope of y = mx + b (these terms will be defined shortly). So you do not have to wonder about data and changes . You can use regression analysis to calculate the change. There are two reasons to use regression analysis: 1. to u nderstand the relationship between variables as shown by a collected pattern of data , and 2. to predict the value of one variable if the value of the other variable is known or set. Sections 4.2 through 4.6, pages 114 –124 of the textbook walk you through tak ing a scatter plot of data and using a line (and simple linear regression ) to model the correlation, predict a value for Y unknown given a value set for X, and determine how well the model linear equation fits the data situation , in terms of error and stan dard deviation. The error and standard deviation could be very small, showing that the linear equation fits well and the data is clustered very close together in relative value. Or, the error and standard deviation could be wide apart, showing the predicti ons of correlations will not be that good, and leading you to wonder if you have the right linear equation modeling the data. These are issues supporting analysts will work on if the solutions do not look right. To orient yourself on using regression to mo del correlation, follow the textbook’s example (page 114) of the Triple A Construction Company. Note the pattern of the six data points (the local payroll amounts) in Figure 4.1, page 115 of the textbook.

When plotted on a graph of X/Y axes, these form a scatter plot, which can be modeled by a line with a certain position and slope. Of course, the standard mathematics line equation, y = mx + b, can model this line and any other straight line, but the difference is in the values for the variables. In terms of linear regression, the equation for a line becomes: Y = β0 + β 1X + ϵ , where UNIT VII STUDY GUIDE Correlation MSL 5080, Methods of Analysis for Business Operations 2 UNIT x STUDY GUIDE Title Y = dependent / response variable X = independent variable β0 = Y-intercept when X = 0 β1 = slope of the line ϵ = random error For a linear regression as a model to solve a correlation, X and β 1 are not known, but they are estimated with sample data. Rewrite the linear (regression) equation based on sample data as: Ŷ = b0 + b1X, where Ŷ = predicted value of Y b0 = estimate of β 0 based on sample results b1 = estimate of β 1 based on sample results You could try a line and “eyeball” it so you can report that you have a close model, but what you really must do to be accurate is to determine the position of a line with minimal error. Error, you define with some common sense: Error = actual value – predicted value And in terms of linear regression equations, this means: E = Y – Ŷ Square errors so that an error in a negative direction does not cancel out an error in a positive direc tion, making the predicted values look more accurate than they may be. The best regression line, then, is the one with the minimum sum of squared errors, which is why regression analysis is also termed least - squares regression. Note how you can find b0 an d b1: by taking the averages of X and Y (summing all the Xs and multiplying the sum by the number of Xs, and doing the same for the Ys) you emplace the resulting averages in equivalent formulas for b0 and b1: As indicated in the textbook, you can sum up these data points, but for cumbersome amounts of data, software can do this for us. Here and in the textbook, in the case of Triple A Construction Company with its six data points, calculating manually will give : (Render, Stair, Hanna, & Hale, 2015 ) MSL 5080, Methods of Analysis for Business Operations 3 UNIT x STUDY GUIDE Title And in the equation, Ŷ = b0 + b1X, Ŷ = 2 + 1.25 X, or sales = 2 + 1.25 (payroll), which enables us to estimate the predicted value of sales for whatever amount the payroll would be set. Also as noted, finding the numbers for the linear regression equation shows us the relationship between the variables. Here, you can see how sales should move, given certain payroll amounts (do not forget that payroll is in units of hundreds of millions and sales is in units of hundreds of thou sands). Measuring the Fit As previously addressed, you can try linear regression equations and settle on one that calculations show is a good fit, but the issue of the amount of error will persist, can be argued over, and finally tends to lead analysts to find out how much error is in an equation and which ones fit with the smallest error. To address these issues and ward off objections to calculations, analysts developed sums of squares total (SST), sums of squares error (SSE), and sums of squares regres sion (SSR) , and methods to test for significance. The reason you square terms in these equations, as you have in past units, is because an error with a negative value may cancel out an error with a positive value when these are added, making the regressio n model equation appear to have a smaller error than it really has. Terms squared are always positive, so that problem is eliminated by converting formulas for error to those where error values are squared. So: Sum of squares total = SST = ∑ ( Y – average of Y values) 2 Sum of squares error = SSE = ∑ e2 = ∑ ( Y – Ŷ)2 The sum of squares regression (which shows how much Y’s variance is) is explained by the regression equation: SSR = ∑ ( Ŷ – average of Y values )2 These sums are related: SST = SSR + SSE. As noted in the textbook on page 118, these measuring tools can be seen as the SSR, showing the explained variability in Y and the SSE showing the unexplained variability in Y. The proportion of these two is called the coefficient of determination, r 2, and calculated with SST, SSE and SSR like this: r2 = SSR = 1 – SSE SST SST A value for r 2 is the percentage of the variability of Y explained by the regression equation, as that developed for payroll (X) for the Triple A Construction Company example. This discussion can now tie in the title of the unit, Correlation. r, or the square root of r 2, is the coefficient of correlation and s hows the strength of correlation of the regression equation. Note the four examples in Figure 4.3 of the textbook, and how in two cases, (a) and (d), the data points are aligned in an exact line, and so that line has perfect correlation, as shown here: (Render, et al., 2015) MSL 5080, Methods of Analysis for Business Operations 4 UNIT x STUDY GUIDE Title Because the line can slope one way or another and still be a perfect correlation, r can be any number including, and between, + 1 and – 1. Checking Significance As you can imagine, there are cases where the available samples take n are too small for these equations measuring fit to work. In these cases, another satisfactory method is to check for significance . For this, use the F Distribution shown in Unit III. As mentioned, analysts turn to the F Distribution because it provides solutions to the ratio between variances, which now you will use to check for significance. In short, the F statistic is: F = MSR / MSE If the MSE of the F statistic is very small in relation to MSR (meaning the F statistic is large), then there is just a minor error in the regression equation and the equation is useful. A large F statistic indicates that the equation solutions are unlikely to be occurring by chance. Again, note that F statistic tables have been ca lculated and published, reducing the amount of mathematics required to check for significance. Reference Render, B., Stair, R. M., Jr., Hanna, M. E., & Hale, T. S. (2015). Quantitative analysis for management (12th ed.). Upper Saddle River, NJ: Pearson. Suggested Reading The links below will direct you to a PowerPoint view of the Chapter 4 Presentation. This will summarize and reinforce the information from this chapter in your textbook. Click here to access a PowerPoint presentation for Chapter 4. Click here to access the PDF view of the presentation. For an overview of the chapter equations, read the “Key Equations” on page 138 of the textbook. (Render, et al., 2015) MSL 5080, Methods of Analysis for Business Operations 5 UNIT x STUDY GUIDE Title Learning Activities (Non -Graded) Work Solved Problem 4 -1 on pages 139 –140; and the Self -Test on page 141 (use the answer key in the back of the book in Appendix H to check your answers). For the Solved Problem, the problem is presented first, followed by its solution. Challenge yourself t o apply what you have learned, and see if you can work out the problem without first looking at the solution, only using the solution to check your own work. Non -graded Learning Activities are provided to aid students in their course of study. You do not have to submit them. If you have questions, contact your instructor for further guidance and information.