Consider your work environment, domain of interest, or everyday life, and identify two variables that are positively correlated and two variables that are negatively correlated. Explain why your corre

MBA 5652, Research Methods 1 Cou rse Learning Outcomes for Unit V Upon completion of this unit, students should be able to: 6. Differentiate between various research -based tools commonly used in businesses. 6.1 Determine the most appropriate statistical procedure to use from among correlation, simple regression, and multiple regression to test hypotheses. 7. Test data for a business research project. 7.1 Establish whether to accept or reject null and alternative hypotheses by using correlation, simple regression, and multiple regression. Course/Unit Learning Outcomes Learning Activity 6.1 Unit Lesson Video: How to Find Correlation in Ex cel with the Data Analysis Toolpak Video: How to Use Excel -The PEARSON Function Video: Excel 2016 Correlation Analysis Video: How to Calculate a Correlation (and p value) in Microsoft Excel Video: Correlation Coefficient in Excel Video: How to Perform a Li near or Multiple Regression (Excel 2013) Video: Multiple Regression Interpretation in Excel Unit V Scholarly Activity 7.1 Unit Lesson Video: Excel 2016 Correlation Analysis Video: How to Calculate a Correlation (and p value) in Microsoft Excel Video: Correlation Coefficient in Excel Video: Multiple Regression Interpretation in Excel Unit V Scholarly Activity Reading Assignment In order to access the following resources, click the links below: Glen, S. (2013 , December 14 ). How to find correlation in Excel with the Data Analysis Toolpak [Video file ]. Retrieved from https://www.youtube.com/watch?v=AjQA78tI39Q Click here for a transcript of the video. TheRMUoHP Biostatistics Resource Channel. (2014 , November 6 ). How to use Excel -The PEARSON Function [Video file ]. Retrieved from https://www.youtube.com/watch?v=JO -Gc5bEG70 Click here for a transcript of the video. Porterfield, T. (2017 , May 18 ). Excel 2016 c orrelation analysis [Video file ]. Retrieved from https://www.youtube.com/watch?v=kr64tfZmiGA Click here for a transcr ipt of the video. UNIT V STUDY GUIDE Data Analysis: Correlation and Regression MBA 5652, Research Methods 2 UNIT x STUDY GUIDE Title Quantitative Specialists. (2014 , September 15 ). How to calculate a correlation (and p -value) in Microsoft Excel [Video file ]. Retrieved from https://www.youtube.com/watch?v=vFcxExzLfZI Click here for a transcript of the video. MrS nyder88. (2009 , November 8 ). Correlation coefficient in Excel [Video file ]. Retrieved f rom https://www.youtube.com/watch?v=s2TVkYmmCAs Click here for a transcript of the video. economistician.com. (2015 , May 15 ). How to perform a linear or multiple regression (Excel 2013) [Video file ]. Retrieved from https://www.youtube.com/watch?v=wBocR96UdyY Click here for a transcript of the video. TheWoundedDoctor. (2013 , May 6 ). Multiple regression interpretation in Excel [Video file ]. Retrieved from https://www.youtube.com/watch?v=tlbdkgYz7FM Click here for a transcript of the video. Unit Lesson Data Analysis: Correlation and Regression Unit IV discussed descriptive statistics and the importance of testing the data to ensure assumptions are met before using parametric statistical procedures. When using descriptive statistics, the data that are collected are described by the researcher bot h visually and statistically. The visual representation alone can reveal information about whether assumptions are met. Although all statistical tests have different assumptions, normality is universally shared and is relatively easy to observe through the use of histograms. It is preferable to use parametric tests since they are more powerful than non -parametric tests, which have fewer assumptions that must be met. Regardless of the statistical procedure under consideration, the assumptions must be met if the researcher can have confidence in the validity of the results. Units V through VII will focus on inferential statistics, which include the parametric tests of correlation, regression, t test, and ANOVA. Inferential Statistics Unlike descriptive stat istics, inferential statistics go beyond simply describing the data to making inferences, or predictions, about a population. The inferences are often based on the characteristics of a sample.

Inferences, or predictions, are stated in the form of hypothese s. Results of statistical tests on samples are used to generalize those results to a population (Zikmund, Babin, Carr, & Griffin, 2013). Descriptive statistics and inferential statistics are not mutually exclusive. In fact, performing descriptive statistic s should always be a precursor to inferential statistics for assumption testing for statistical procedures being considered. Populations, Samples, and Generalization Statistical procedures are used to answer questions about a population. A population can be people or things, such as a company’s entire consumer base or the total units produced for a new product. A population can be very large or very small. For example, a company may collect producti vity data on their 100 employees. They are interested in knowing if there is a relationship between the size of merit increases and job productivity.

The 100 employees represent the entire population, which would be considered a census. Since data are coll ected from all 100 employees, the company can have certainty that the statistical results represent the entire population. In many instances, however, it is impractical and cost prohibitive to collect data from all participants in the population. In these scenarios, data is collected from a sample of the population. The statistical results from the sample are then used to generalize the findings to the population. Using the example above, now assume the company has a population of 200,000 employees. They de cide to select a random sample of 100 employees to whom they have provided various merit increases. Like the example MBA 5652, Research Methods 3 UNIT x STUDY GUIDE Title above, their interest is to understand if there is a relationship between the size of merit increases and job productivity. If they determi ne that there is a statistically significant relationship between the size of merit increases and productivity, they can generalize those results to the population of 200,000 employees. This can inform their decision -making and planning regarding the size of raises to provide for the next fiscal year and the productivity increase they can forecast. This is the function of inferential statistics. Relationships or Differences Statistical analysis can be simplified as either looking for relationships (or ass ociations) between variables or looking for differences between variables or groups. This unit considers statistical testing that looks for relationships between variables. The statistical procedures highlighted to test for relationships will be correlatio n, simple regression, and multiple regression. Correlation and regression analyses are parametric tests. Chi -square is a corresponding non -parametric test. Correlation Although many course concepts in research methods may be new and foreign, correlation m ay feel more familiar and comfortable. The concept of correlation makes intuitive sense to most people since relationships between variables (e.g., years of education and income, safety training hours and lost time hours, an d hours of exercise and weight l oss) occur frequently in daily life. Relationships naturally occurring between variables can be positive or negative. A positive or negative relationship between variables does not mean positive or negative in the context of making a value judgment of good or bad. A positive or negative relationship, in statistical terms, means the direction of the relationship. An example of a positive relationship between variables is durable goods orders and the S&P 500 index.

When durable goods orders decrease, there i s a decrease in the S&P 500 index. When durable goods orders increase, there is an increase in the S&P 500 index. This is a positive relationship because both variables move in the same direction. As one variable increases, the other increases. Conversely, when one variable decreases, the other decreases. An example of a negative relationship between variables is outdoor temperature and heating oil expenditures.

When the outdoor temperature increases, heating oil expenditures decrease. When the outdoor temp erature decreases, heating oil expenditures increase. This is a negative relationship because the variables move in opposite directions. As one variable increases, the other decreases. Conversely, when one variable decreases, the other increases. Another important distinction that must be understood is the difference between correlation and causation. Even if a statistical test (e.g. Pearson’s r) indicates a statistically significant relationship between variables, it must never be said that one variable c auses the change in the other variable . For example, there is a positive correlation between ice cream sales and violent crime in New York City (both increase in the warmer months of the year, and both decrease in the cooler months). It would be absurd to say that ice cream causes violent crime —even though the relationship between variables does exist. This extreme example makes the point that correlation does not mean causation . Causation can only be statistically shown via experimental research designs, w hich have tight controls to manipulate variables. Pearson Correlation Coefficient ( r) When conducting correlation analysis, the Pearson correlation coefficient ( r) is the most commonly used parametric measure of association between two variables (Norusis, 2008). The Pearson statistic is represented by r, which is the standardized covariance between the variables, and measures the linear relationship between variables (Field, 2005). The Pearson correlation coefficient is sometimes represented by R, but this is normally used in the context of regression analysis. One can easily determine how to calculate r using long -hand by referring to a statistics textbook, but it is much easier and faster to use statistical software to quickly calculate the Pearson correl ation coefficient. For the purposes of this course, it is most important to understand what Pearson’s r is, what it measures, and how to interpret it, rather than how to calculate it by long -hand. MBA 5652, Research Methods 4 UNIT x STUDY GUIDE Title When using correlation analysis, a hypothesis is tested th at there is no statistically significant relationship between variables. The null and alternative hypotheses would be stated like so. Ho1: There is no statistically significant relationship between X and Y. Ha1: There is a statistically significant relat ionship between X and Y. As mentioned above, the r statistic can indicate a positive relationship or a negative relationship between variables. The r statistic can also indicate no relationship at all between variables. An r of +1 indicates a perfect positive correlation, while an r of -1 indicates a perfect negative correlation (Field, 2005). The r statistic will always fall between +1 and -1. An r of 0 indicates no correlation exists between variables. When reviewing the literature for research articles, it is very common to find r statistics less than .5. Given the fact that an r of 1 indicates a perfect correlation, a statistically significant r of .5 or less hardly seems large enough to get excited ab out; however, the American Psychological Association would disagree. The American Psychological Association (as cited in Kerr, Garvin, Heaton, & Boyle , 2006) concluded that psychologists studying highly complex human behavior should be satisfied with corr elations in the r = 0.10 to 0.20 range, and they should be generally pleased with correlations in the 0.25 –0.35 area. The best new variables typically increase predictions, for instance, of job performance between 1% and 4%. A 10% contribution of emotional intelligence would be considered very large (Kerr et al., 2006). Although there are no concrete guidelines for interpreting r and R2, The following chart suggests some general guidelines that are fairly consistent with other rule -of-thumb published guide lines. Correlation Adapted from Guideline for Interpreting Correlation Coefficient by I. Phanny, 2014 . (https://www.slideshare.net/phannithrupp/guideline -for -interpreting -correlation -coefficient/2 ). MBA 5652, Research Methods 5 UNIT x STUDY GUIDE Title Coefficient of Determination (R2) The Pearson’s r is useful itself, but the closely related coefficient of determination ( R2) is also very informative. Simply squaring r produces R2, which indicates the amount of variability in one variable that is explained by the other variable (Field, 2005). According to the American Psychological Association (as cited in Kerr et al., 2006 ), a researcher should be generally pleased with a correlation of r = .25, which translates to a coefficient of determination R2 = .0625 . This means that the variable x explains 6.25% of the variability in the variable y. Most statistical software programs will calculate both r and R2 for when running correlation analysis, so it is easy to see the strength of the association and the explai ned variance. Again, it is important not to confuse correlation with causation. Examples of r and R2: r = .10, R2 = .01 explains 1% of the total variance between the variables being tested r = .30, R2 = .09 explains 9% of the total variance between the variables being tested r = .50, R2 = .25 explains 25% of the total variance between the variables being tested Interpreting Correlation Output Results The following correlation analysis looked for a s tatistically significant relationship between the variables of height and weight. The results show that there is a moderately strong correlation r = .6 (Pearson’s Correlation). It is also necessary to assess whether the correlation is statistically signifi cant using an alpha of .05. The results indicate a p value of .023 < .05. Therefore, the null hypothesis is rejected, and the alternative hypothesis is accepted. Reject Ho1: There is no statistically significant relationship between weight and height. Accept Ha1: There is a statistically significant relationship between weight and height. Although the information obtained through correlation analysis is revealing and useful, it is limited in that correlation analysis cannot be used to make predictions (Field, 2005). To be able to predict the value of a dependent variable (DV) from observations of the independent variable (IV), regression analysis must be used. Regression Analysis Relationships between variables can be useful for making predictions. Regression analysis is a concept that many students have heard of, even if they are not entirely comfortable with it. If the relationship between the variables X and Y are known, predic tions can be made about how a change in X will relate to a change in Y. Remember that this is not stating that a change in X causes a change in Y. It is only possible to predict a change based on the relationship between variables. Regression analysis can be powerful, especially when multiple X variables are included (multiple regression) to make a prediction about a change in a single Y variable. When using regression analysis, a hypothesis is tested that there is no statistically significant prediction o f the dependent variable (i.e., Y or outcome variable) by one or more independent variables ( X). If a single independent variable is used to predict Y, it is termed simple regression. If two or more independent X variables are used to predict Y, it is term ed multiple regression. The null and alternative hypotheses would be stated as follows. Ho1: There is no statistically significant relationship to predict Y from X1, X2… and Xn. MBA 5652, Research Methods 6 UNIT x STUDY GUIDE Title Ha1: There is a statistically significant relationship to predict Y from X1, X2…and Xn. Regression analysis uses a linear model to apply a line of best fit to the data. The line of best fit is the most optimal because it results in the smallest amount of difference between the observed data points and the line (Field, 2005). As the linear regression example below shows, a line of best fit is applied to the data for the variables mortality (DV) and cigarette consumption (IV). This is an example of simple linear regression because there is only one IV. If all of the data points fell on a straight line, it would be a perfect linear relationship, which would allow us to make a perfect prediction o f the Y axis variable by looking at the X axis variable (Norusis, 2008). A perfect linear relationship is rare, so we develop the regression model as Y = a + b(X). The resulting mathematical model is tested for statistical significance. If statistically significant, at a p value of less than .05, the IV data can be plugged into the model to be multiplied by the calculated coefficient, added to the calculated constant ( Y- intercept or a0), resulting in the pre dicted DV. The statistical software will calculate the model and values for a0 and b1, which will appear as the following equation: Y = a0 + b1 (X) or DV = a0 + b1 (IV 1) Simple regression creates the statistical model, shown above , with a single indepe ndent variable (IV), sometimes referred to as a predictor variable, and a single DV, sometimes referred to as the outcome or criterion variable. Multiple regression creates a statistical model with a single independent variable and two or more DVs. The mul tiple regression model is similar to the simple regression equation in that it still contains a Y-intercept, or a, but the multiple regression model contains multiple IVs and multiple corresponding coefficients, or bx, as shown below. Adapted from images in Multiple Linear Regression by J . Neill, 2008 (https://www.slideshare.net/jtneill/multiple -linear -regression ). MBA 5652, Research Methods 7 UNIT x STUDY GUIDE Title Y = a0 + b1X1 + b2X2 +…+ bnXn or DV = a0 + b1(IV 1) + b2(IV 2) +…+ bn(IV n) If the multiple regression model is statistically significant, at a p value of less than .05, the IV data can be plugged into the model to be multiplied by the calculated coefficients, added to the calculated constant ( Y-intercept or a0), resulting in the predicted DV. Interpreting Regression Output Results Interpreting simple and multiple regression output is similar. There are several key test statistics and p values that are returned in a regression analysis that must be evaluated to a) determine stati stical significance and b) assess the strength of the linear regression model. Multiple R: This is Pearson’s r, as discussed in the correlation section. Regression often uses a capital R instead of r. This is simply the square root of r2. Multiple R descr ibes the strength of the correlation between the model and the dependent variable. In the regression output below, the multiple R figure of 99.2% indicates a very strong positive correlation between the regression model and the dependent (output) variable . R square ( r2): This is the coefficient of determination as was discussed in the correlation section. Regression often uses a capital R. The square of R explains the amount of variation in the dependent (output) variable that is explained by the regressi on model. In the regression output below, the R square ( r2) figure indicates that 98.3% of the variation in the dependent variable is explained by the regression model. This is a very high r2. ANOVA: This indicates whether the regression model is statistically significant in its ability to predict the dependent variable. ANOVA uses significance F for probability, and this is synonymous with the p value discussed previously in the course. A significanc e level of F < .05 indicates statistical significance. In the regression output below, the significance level of F = .000009 < .05 would indicate that the null hypothesis should be rejected, and the alternative accepted that there is a statistically signi ficant relationship between the regression model and the dependent variable. The t Stat: This assesses the statistical significance of the individual predictor variable coefficients. A p value < .05 for any given t stat indicates statistical significance for the corresponding coefficient. In the regression output below, the p values for the coefficients of variables 1, 2, and 3 are all <.05, which indicates that they each make a statistically significant contribution to the regression model (Field, 2005). In multiple regression, the output will not always show statistical significance for all coefficients in the model.

There is debate among researchers about how to treat non -significant coefficients. Some researchers recommend removing the non -significant variable(s) and rerunning the regression analysis with only the statistically significant predictor variable (Field, 2005). Other researchers recommend leaving all the coefficients in the regression model, even if some are not statistically significant. S till others recommend more complicated ways to treat the data. Regardless of these many conflicting suggestions, the important thing to Adapted from images in Multiple Linear Regression by J . Neill, 2008 (https://www.slideshare.net/jtneill/multiple -linear -regression ). MBA 5652, Research Methods 8 UNIT x STUDY GUIDE Title note is that non -statistically significant independent predictor variable coefficients do not contribute significantly t o the regression model and, therefore, have no influence on predicting the dependent variable, even if they are left in the model. It is only the statistically significant coefficients that have predictive power in the regression model. After evaluating the regression output and determining that the model explains a great deal of variability in the dependent variable and is strongly correlated with the dependent variable and that there is a statistically significant relationship between t he regression model and the dependent variable, the model can be expressed as a predictive equation. DV = -45.796348 + 0.5969718(Variable 1) + 1.1768377(Variable 2) + 0.4051086(Variable 3) The model can then be used to predict the DV by plugging in arbit rary values for Variable 1, Variable 2, and Variable 3. To interpret the output for simple regression, the same steps are followed as in multiple regression. The only difference in the process is that there will only be one predictor variable and coeffici ent that will be analyzed for statistical significance, rather than multiple predictor variables. References Field, A. (2005). Discovering stats using SPSS (2nd ed.). London, England: Sage. Kerr, R., Garvin, J., Heaton, N., & Boyle, E. (2006). Emotional intelligence and leadership effectiveness. Leadership & Organization Development Journal , 27 (4), 265 –279. Neill, J. (2008). Multiple linear regression [PowerPoint slides]. Retrieved from https://www.slideshare.net/jtneill/multiple -linear -regression Norusis, M. J. (2008). SPSS 16.0 guide to data analysis (2nd ed.) . Upper Saddle River, NJ: Prentice Hall. Phanny, I. (2014). Guideline for interpreting correlation coefficient [PowerPoint slid es] . Retrieved from https://www.slideshare.net/phannithrupp/guideline -for -interpreting -correlation -coefficient/2 MBA 5652, Research Methods 9 UNIT x STUDY GUIDE Title Zikmund, W. G., Babin, B. J., Carr, J. C., & Griffin, M. (2013). Business research methods (9th ed.). Mason, OH: Cengage. Suggested Reading The following video shows how to calculat e Pearson’s correlation coefficient (Pearson’s r) by long -hand. It provides a great appreciation for software. statslectures. (2010 , October 3 ). Pearson's r correlation [Video file ]. Retrieved from https://www.youtube.com/watch?v=2B_UW -RweSE Click here for a transcript of the video.