In this activity, you will compare population parameters calculated from the entire dataset with statistics calculated from samples taken from the dataset.1. Using the textbook dataset, find the mean,

Chapter 5 Minitab Express In this activity, you will compare population parameters calculated from the entire dataset with statistics calculated from samples taken from the dataset. Using the textbook dataset, find the mean, median, range and standard devia tion for public school final enrollment (C16) , SAT grand total average score (C21) and public school expenditures (C18) . Notice that these are population parameters. To demonstrate the sampling process, I will work with th e variable Violent Crime s (C22). I will examine the descriptive statistics first (Stat istics, Describe, Descriptive Statistics). You must do each variable individually. Here is my output: Remember you can use the statistics option to change the output. Let’s get an idea of the distr ibution for Violent Crimes . Here is a dot plot of Violent Crimes . There are some counties with high violent crimes (Guilford, Mecklenberg, Wake). To c reate a sample of 20%, Use the command Data , Sample from Columns: Now I have a sample of 20% (20 rows since this data set has 100 rows) in C79 . I can use the sample to determine the means for the same variables used in the first question. For the da ta in C79 , here are the summary stat istics. The mean is a bit higher, the median is lower and the SE is higher (why?). Parameters: Let’s compare this mean (396.5 ) with the mean from the full data set which was 339.66 . The standard error of the m ean for the full dataset was 69.64 . So the mean from the 20% sample falls within one standard error of the mean for the full dataset. Notice how the variability of a variable affects the sample’s accuracy. Remember that accuracy depends on sample size and confidence. We can use the empirical rule to compare the mean of the full data set to the mean of the sample. Use the standard error to determine the approximate 95% confidence interval ( ± 2 standard errors). Compare the width of the interval for the full dataset and the sample. For my example: 95% confidence interval f or mean from full dataset: (200.38, 478.94 ) 95% confidence interval for 20% sample: (136.3, 656.7 ) Notice how the intervals differ. Which interval is more accurate? More precise? NOTE: Each time you draw a sample of 20 rows, the summary statistics will be different since a different set of counties is selected.