These are statistic exercises. Need to use R to answer the exercises and include all the answers in just a Rmd file. Also, include the code, too. There are two exercises and each of them has at least

Exercises

Use R but be sure to understand how quantities in the output are related and how to calculate confidence intervals and hypothesis tests.
Use lm(y ~x, data = data) . Make sure the data is saved as dataframe using data = data.frame(data)
Use plot(lm(y~x, data = data)) to get the diagnostic plots. You have to hit enter everytime in the console to get each plots.
Provide reasonably accurate p-values for all hypothesis tests (interpolate, if necessary).
Express conclusions in complete sentences that would be meaningful to interested parties (e.g., “The true mean length is between 4.53 cm and 5.43 cm.”).
Do not turn in unedited computer output for problems worked on a computer. Cut/paste the relevant plots and/or tables and include them into your interpretation as if you were writing a technical report. Provide only material that supports your answers.
Try to be concise as much as possible without affecting the answer.

(continued next page)

An infectious disease loses its viability over time when suspended in air. In a study of this disease, solutions of the organism were dispersed in aerosol clouds and then the percentage of viable organisms was observed. An independent observation was obtained after each of 0 min, 2 min, …, 60 min. The data are given below (disease.csv).

Plot viability vs. time. Is a straight line fit appropriate?
Plot log10 of viability vs. time. Is a straight line appropriatehere?
Obtain the least squares fitted regression line for part (b) on top of the data. Use this fit also to express the prediction of viability (not just log viability) as a function of time. What is your prediction for two hours?
Obtain the analysis of variance table for the regression in (c) and test H0: 1 = 0. Use  =.05.
Provide a 98% confidence interval for 1. Get the standard error from the computer, but otherwise show the computation of the interval. Use : summary(lm(y~x, data = disease)).
Plot the residuals vs. time and judge whether there are any serious problems with model fit or the constant spread assumption.
Check the normality assumption of the residuals.
Calculate and interpret the R2 coefficient. Why is “correlation” not an appropriate concept for these data?