Attaching file with question and added on files needed to answer questions.

Name:

  1. Data

The datafile covidMaskwearing.csv contains the dataset collected as part of a study on Covid Mask adherence.1. The variables in the dataset are described here. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0249891

  1. Create a graph that shows how the presence (1) or absence (0) of mask policies in July affectedthe distribution of August case rates.

  2. By creating a graph, observe the distribution of August case rates and comment on what may bethe most appropriate measure of center. Then compute it.

  3. Compare the measure of center you decided on in (b) in mask mandate and non-mandate states.

  4. Construct a plot that shows the relationship between mask wearing adherence in August andAugust case rates. Comment on the strength of the relationship (for full credit, compute a measure of that strength.).

  1. Probability Distributions

The datafile populationPyramid.csvcontains the probabilities of a randomly selected individual in the US being in a certain age bracket and assigned a certain gender at birth. This data is visually displayed at the following link.

https://www.populationpyramid.net/united-states-of-america/2020/

The probability of an individual being male based on this dataset is 0.4948 and the probability of being female is 0.5052

  1. What is the probability that a randomly selected individual in the US is between 50 and 59.

  2. What’s probability an individual is either between 25 and 29 or a male?

  3. Given that an individual is a female, what is the probability they’re between 25 and 34?

  4. Given that an individual is between 90 and 99, what is the probability they are female?

  5. Using the center of each age bracket as the observation value, compute the expected value of theage of a randomly drawn male.

  1. Bayes’ Law

Consider an automated plagiarism detection software that is used to evaluate essay submissions. Three sections of a writing course use the software to check for plagarism, with 36% of the students in section 1, 33% in section 2, 31% in section 3. In section 1 of a course, 34% of the essays are flagged, in section 2, 15%, and in section 3, 32%.

Additionally, of the essays flagged, 23% were on time, while 77% were turned in late. For the nonflagged essay, 83% were on time, and 17% were late.

For notation purposes, let S1,S2,S3 be your section events, L be a late essay, with L¯ being on time, and F be a flagged essay, with F¯ be a non-flagged essay.

  1. What percentage of total essays were flagged overall?

  2. Given that a particular student committed plagiarism, what in the probability that they wereregistered for section 1 of the course.

  3. Given that a particular students essay was flagged, what’s the probability they turned their essayin late.

  1. Probability Distributions

Consider the probabilities from problem 2.

  1. If we were to randomly select 20 people in the US, could we use a binomial distribution to calculatethe probability that x number of those twenty were between 0 − 19 Explain why or why not.

  2. Compute the probability that 2 people in the sample are between 0-19.

  3. Compute the probability that more than 5 people in the sample are between 0-19

  4. Graph the probability mass function for x number of people are between 0-19 in a sample of size 5 (R/by hand is fine)

The End.

1 Fischer CB, Adrien N, Silguero JJ, Hopper JJ, Chowdhury AI, Werler MM (2021) Mask adherence and rate of COVID-19 across the United States. PLoS ONE 16(4): e0249891. https://doi.org/10.1371/journal.pone.0249891