Economics Project

Regression Analysis Population Linear Regression Model  X and Y relationship is described by a linear function.  Changes in Y are assumed to be influenced by changes in X.  Example: housing prices (y) and square feet (x) House Price in $1000s (Y) Square Feet (X) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Slope =0.10977 Intercept =98.248 Sample Linear Regression Model 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet House Price ($1000s) feet) (square 0.10977 98.24833 price house   Total variation is made up of two parts: Sum of Squares Total Sum of Squares Regression Sum of Squares Error (residual) Sample Linear Regression Model  SST: Variation of yi values around their mean, y.  SSR: Explained variation due to the linear relationship between x and y.  SSE: Unexplained variation due to factors other than the linear relationship between x and y. Ordinary Least Squares (OLS) is the most common regression method. It fits the line to minimize the sum of the squared residuals SSE SSR SST   SST= (yi-y)2 å    2i i )y (y SSE ˆ    2 i )y y( SSR ˆ Coefficient of Determination (R 2 )  T he portion of the total variation in the dependent variable that is explained by variation in the independent variable. 1 R 0 2  total squares of sum regression squares of sum SST SSR R 2   Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 58.08% of the variation in house prices is explained by variation in square feet  We use OLS using Microsoft Excel’s Regression tool in the Data tab (verify you installed the Analysis Toolpak ). 0.58082 32600.5000 18934.9348 SST SSR R2     t test for a population slope  Is there a linear relationship between X and Y?  Null and alternative hypotheses  Null: β 1 = 0 (no linear relationship)  Alternative: β 1  0 (linear relationship does exist)  Test statistic  The standard error measures the variation in sample means from across a population.  In a large sample (n>30), if the t-statistic > 2 , we reject the null hypothesis that the proposed explanatory variable has no effect at the 5% significance level (α ) or 95% confidence level .  Most analysts would just say the explanatory variable is statistically significant . Inference about the Slope: t Test where: b1 = regression slope coefficient β1 = hypothesized slope sb1 = standard error of the slope 1b 1 1 s β b t   Null: β 1 = 0 Alternative: β 1  0 Test Statistic : t = 3.329 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 t b 1 Decision: Reject H 0 Conclusion: There is sufficient evidence that square footage affects a homes price. Inference about the Slope: t Test 1bs P -value = 0.01039 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 P -value Decision: P -value < α so Reject H 0 Conclusion: There is sufficient evidence that square footage affects house price Inference about the Slope: P value Null: β 1 = 0 Alternative: β 1  0 Confidence Interval Estimate Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 This 95% confidence interval does not include 0. Conclusion: • There exists a relationship between house price and square feet at the 5% significance level. • We are 95% confident that square footage influences sales price between $33.70 and $185.80 per square foot of house size. Forecasting with Regression Analysis The predicted price for a house with 2000 square feet is 98.24833+2000(.10977)=317,788 Multivariate Regression Population Multivariate Regression Equation with K Independent Variables: Y-intercept Population slopes Random Error Estimated (or predicted) value of y Estimated slope coefficients Sample Multivariate Regression E quation with K Independent Variables : Estimated intercept ε X β X β X β β Y K K 2 2 1 1 0        Ki K 2i 2 1i 1 0 i x b x b x b b y       ˆ Example: Julian Pie Company Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14 450 5.00 3.5 15 300 7.00 2.7 Sales = b 0 + b 1 (Price) + b 2 (Advertising) Excel Output Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables Economic vs. Statistical Significance Mini Case: CEO Compensation  Y = a + bA + cL + dS + fX + e  The dependent variable, Y, is CEO compensation in 000 of dollars.  The explanatory variables are assets A , number of workers L, average return on stocks S and CEO’s experience X .  OLS regression is Ŷ = 6,787 + 11.4 A + 14.0 L + 35.1 S + 79.9 X  t-statistics for the coefficients of A , L, S and X are 10.1, 8.77, 5.25 and 3.40, respectively.  Based on these t-statistics, all 4 variables are ‘statistically significant.  Although all these variables are statistically significantly , not all of them are economically significant .  For instance, S is statistically significant but its effect on CEO’s compensation is very small: one percentage point increase of shareholder return would add $35,000 per year to the CEO’s wage.  So, S is statistically significant but economically not very important. Functional Form  Don’t assume that economic relationships are always linear.  Choosing the correct functional form may be difficult.  One useful step, especially if there is only one explanatory variable, is to plot the data and the estimated regression line for each functional form under consideration.  Graphical Presentation in Figure 3.6  Panel a shows a linear regression line of the form Q = a + bA + e  Panel b shows a quadratic regression curve of the form Q = a + bA + cA 2 + e  Linear form: R 2 = 0.85. Quadratic form: R 2 = 0.99  The q uadratic regression in panel b fits better than the linear regression in panel a. Extrapolation & Forecasting Extrapolation  Extrapolation seeks to forecast a variable as a function of time.  Time series data is smoothed in some way to reveal an underlying pattern, and this pattern is then extrapolated into the future.  Two linear smoothing techniques are trend line and seasonal variation . Trends  If Nike estimates a trend Line: R = a + bt + e, where t is time, R is revenue , and a and b are the coefficients to be estimated.  The estimated trend line is R = 4.139 + 0.138 t, with statistically significant coefficients.  Nike forecasts sales in quarter 3, 2017 (quarter 35) as 4.139 + (0.138 × 35) = $8.97 billion.