These are statistic exercises. Need to use R to answer the exercises and include all the answers in just a Rmd file. Also, include the code, too. There are two exercises and each of them has at least

Exercises


  • Use R but be sure to understand how quantities in the output are related and how to calculate confidence intervals and hypothesis tests.

  • Use lm(y ~x, data = data) . Make sure the data is saved as dataframe using data = data.frame(data)

  • Use plot(lm(y~x, data = data)) to get the diagnostic plots. You have to hit enter everytime in the console to get each plots.

  • Provide reasonably accurate p-values for all hypothesis tests (interpolate, if necessary).

  • Express conclusions in complete sentences that would be meaningful to interested parties (e.g., “The true mean length is between 4.53 cm and 5.43 cm.”).

  • Do not turn in unedited computer output for problems worked on a computer. Cut/paste the relevant plots and/or tables and include them into your interpretation as if you were writing a technical report. Provide only material that supports your answers.

  • Try to be concise as much as possible without affecting the answer.


  1. Exercise 11.7.7(e-i) (radiata_pine.csv).

    1. Testthenullhypothesisthattheslopeparameterβ1isatleast160versustheresearchhypothesis that it is less than 160. State the null and the alternative. Use α = 0.02, but be sure to provide a p- value.

    2. Provide a 98% confidence interval for the slope parameter. (You can also compute this using the estimate and standard error using the prompt summary(lm(y~x)) of R).

    3. R gives a t-statistic for the slope parameter, along with a p-value. State the hypotheses that these are for.

    4. Plot the residuals versus density and judge whether they show evidence of either lack of model fit or non-constant variance (that is, whether at least one of the assumptions for the statistical analysis does not hold). Hint: there is something to find.

    5. Even though part (h) suggests some assumptions are not appropriate, obtain a normal quantile plot of the residuals. What is evident here (that can be explained by the problem observed in part (h))?

(continued next page)

  1. An infectious disease loses its viability over time when suspended in air. In a study of this disease, solutions of the organism were dispersed in aerosol clouds and then the percentage of viable organisms was observed. An independent observation was obtained after each of 0 min, 2 min, …, 60 min. The data are given below (disease.csv).


Cloud

Time

Viability

Cloud

Time

Viability

Cloud

Time

Viability

1

0

78.9

2

2

66.6

3

4

51.6

4

6

46.8

5

8

51.4

6

10

40.7

7

12

43.0

8

14

43.9

9

16

34.6

10

18

36.7

11

20

30.3

12

22

30.5

13

24

21.0

14

26

27.9

15

28

17.5

16

30

23.9

17

32

19.7

18

34

19.1

19

36

17.1

20

38

20.0

21

40

12.0

22

42

11.0

23

44

12.8

24

46

8.8

25

48

13.2

26

50

9.2

27

52

10.9

28

54

11.1

29

56

7.7

30

58

7.1

31

60

5.8

  1. Plot viability vs. time. Is a straight line fit appropriate?

  2. Plot log10 of viability vs. time. Is a straight line appropriatehere?

  3. Obtain the least squares fitted regression line for part (b) on top of the data. Use this fit also to express the prediction of viability (not just log viability) as a function of time. What is your prediction for two hours?

  4. Obtain the analysis of variance table for the regression in (c) and test H0: 1 = 0. Use =.05.

  5. Provide a 98% confidence interval for 1. Get the standard error from the computer, but otherwise show the computation of the interval. Use : summary(lm(y~x, data = disease)).

  6. Plot the residuals vs. time and judge whether there are any serious problems with model fit or the constant spread assumption.

  7. Check the normality assumption of the residuals.

  8. Calculate and interpret the R2 coefficient. Why is “correlation” not an appropriate concept for these data?