Econometrics HW

  • Explain in your own words how to do the tests and whether or not you reject the hypotheses. For all tests, report p-values.

  • Cut and paste your software output for all commands that you execute. In Stata, an easy way to copy is to highlight, right-click, and choose ‘Copy as Picture.’

Use the data in the file mlbsal17. The file contains a cross section of 237 observations on salaries and performance measures for baseball players. (When importing the data into Stata, make sure to check ‘first row is variable names.’)


Suppose you want to model a player’s salary as a function of the player’s total games played, years, runs scored, runs batted in, home runs, doubles, triples, and batting average. The units of salary are thousands of dollars.


The log-linear model takes the form:


ln(salary) = β0 + β1games + β2yrs + β3ba + β4runs + β5rbi + β6doubles + β7triples + β8hr + u.


The linear model takes the form:


salary = b0 + b1games + b2yrs + b3ba + b4runs + b5rbi + b6doubles + b7triples + b­­­8hr + v


The double-log model takes the form:


ln(salary)=B0 + B1ln(games) + B­2ln(yrs) + B3ln(ba) + B4ln(runs) + B5ln(rbi) + B6ln(doubles) + B7ln(triples) + B8ln(hr) + w.


  1. For all three models, report and interpret the estimated coefficient for games.



  1. Which of the three models fits the data best?



  1. Consider the 73rd player in the sample, with 1001 games played, 9 years, etc. What salary does the double-log model predict for such a player?






  1. Compute a chi-square statistic for testing the hypothesis that the double-log model has no explanatory power.


H0: R2 = 0.


This chi-square has how many degrees of freedom? (Remember also for every test in this exam to report the p-value.)



  1. Test at the .05 level the hypothesis that, all else equal, ln(yrs) has no influence on ln(salary).


H0: B2 = 0.


Now do the same for ln(games).


H0: B1 = 0.


The test statistics have how many degrees of freedom?



  1. Can you use the results from part e to make any inferences about the hypothesis that ln(yrs) and ln(games) together have no effect on ln(salary),


H0: B1 = B2 = 0?


Explain.


Perform an F-test of the above hypothesis. State the degrees of freedom for the test, and show how the test statistic can be computed from the sums of squared residuals. Can you reject H0?



  1. In the presence of heteroscedasticity, are the coefficient estimates still BLUE? Describe the properties of the estimators of the coefficients and the standard errors under heteroscedasticity.



  1. Use a Breusch-Pagan test to test for heteroscedasticity in both the linear and double-log models. State the null hypothesis, the distribution of the test statistic, and its degrees of freedom. Do you find evidence of heteroscedasticity? Does taking logs make a difference?



  1. At the .10 significance level, perform a heteroscedasticity-consistent test of the hypothesis that, in the linear model, the coefficient on hr equals zero


H0: b8 = 0.


Does correcting for heteroscedasticity change your inference on this coefficient in this model?



  1. Find the double-log model that maximizes the adjusted R2. State the rule that gives a necessary condition for the adjusted R2 to be maximized. Does the adjusted R2 have interpretation as a percentage of variation?



  1. Are the standard errors larger in the full double-log model or in the reduced model? Explain this result.



  1. Show that the bias in the coefficient on ln(games) due to omitting from the model ln(yrs) equals the product of (1) the coefficient on ln(yrs) and (2) the coefficient on ln(games) from a regression of ln(yrs) on all the explanatory variables.