In this unit, you learned how to determine sample size and estimate confidence intervals. For this assignment, you will create a PowerPoint presentation that further explores these concepts. In your p

PUH 5302, Applied Biostatistics 1 Cou rse Learning Outcomes for Unit IV Upon completion of this unit, students should be able to: 6. Summarize the major principles for determining sample size and power. 6.1 Define point estimate, standard error, confidence level, and margin of error. 6.2 Compare major methods for determining sample size and power. 6.3 Explain the effects of different sample sizes. Course/Unit Learning Outcomes Learning Activity 6.1 Unit Lesson Chapter 6 Unit IV PowerPoint Presentation 6.2 Unit Lesson Chapter 6 Unit IV PowerPoint Presentation 6.3 Unit Lesson Chapter 6 Unit IV PowerPoint Presentation Reading Assignment Chapter 6 : Confidence Interval Estimates Unit Lesson Welcome to Unit IV. In the previous unit, we discussed how probability principles were applied to solving some public health issues. We discussed population and samples and the various methods of obtaining samples from a population . We also stressed the importance for having the sample reflective of the population to enable the generalization of results. In this unit, we will discuss sample size determination. That is, how do we know that the sample we have selected is enough for a given population? We will discuss some parameters considered in sample size determination and in inferential statistics. Before we get to sample size determination, let us define some relevant terms that are used in sample size determination. Point estim ate is a statistical process of finding the approximate value of a paramete r (Sullivan, 2018). For example, we may have a set of data (a set of test scores). From those scores, we may calculate the mean, median, mode, or percentages. W e may use those value s to describe or say something about the population under study (students). Those values are the point of estimate since we are trying to estimate a point such as the mean or median. In fact, any statistic can be a point of estimate. For example:  Standard deviation is a point estimate of the population standard deviation.  Mean is a point estimate of the population mean.  Variance is a point estimate of the population variance. For example, suppose a set of student scores are listed below: {56, 57, 59, 58, 56, 56, 57, 61 } UNIT IV STUDY GUIDE Determination of Sample Size PUH 5302, Applied Biostatistics 2 UNIT x STUDY GUIDE Title The summary statistics are as follows:  mean = 57.5  standard deviation = 0.71  n = 8 The mean of our sample, 57.5, is an estimate of the student scores. To put this in our statistical language, the sample average score is an estimate of t he population average score. Standard error (SE) of a mean measures the variability between sample means when multiple means of variables are being analyzed. The SE of a mean can be used to determine the accuracy of the mean to estimate the population mean. In statistics, when a sample mean deviates from the actual mean of a population, we refer to that deviation as the standard erro r (Sullivan, 2018). Usually these things are true.  Lower values of standard error indicate a more precise estimate of the population mean .  Larger values of standard error indicate a less precise estimate of the population mean .  A larger but appropriate sample may result in a smaller standard error and a more precise estimate. The standard error is also inversely proportional to the sample size; the larger the sample size, the smaller the standard error. The standard error is part of descriptive statistics as it represents the standard deviation of the scores from the mean. This serves as a measurement for the spread. Therefore, the smaller the spread, the more accurate the data. Gen erally, for any dataset, descriptive statistics is calculated as the mean, median, standard deviation, and variance. The standard error can include the difference between the calculated mean of the population , and it helps compensate for any inaccuracies r elated to the process of sample selection. W hen the standard error is small, the data are said to be more representative of the true mean. In cases where the standard error is large, the data may have some outliers that must be corrected or eliminated by c onducting exploratory data analysis (EDA). If the researcher fails to do EDA, the outliers may affect the results of the study. Confidence Interval is an estimate of the range of values within which the true population parameter falls (Sullivan, 2018). Th at is, it is a range within which the precise measurement is located. For example, for most chronic diseases or injuries, public health researchers may want to report the occurrence in proportion or rates. For example, the incidence of a disease in a given population may be reported in the following manner. Out of 1,500 people surveyed, 70% of the participants were infected with the disease, within the margin of error of plus or minus 3 percentage points. Al though not stated, most studies are done at 95% confidence level. Therefore, in the simplest terms, the report means that there is 95% chance that 70% of the participants had the infection, plus or minus 3%. Conversely, there a 5% chance that 30% of the pa rticipants do not have the infection. The precise statistical definition of the 95% confidence interval is that if the survey were conducted 100 times, 95 times the percent of participants having the infection would be within the calculated confidence inte rvals, and 5% of the participants would be either higher or lower than the range of the confidence intervals. The confidence interval reveals more than a range around the estimate. It also reveals the stability of the estimate (Sullivan, 2018). An estimat e is considered stable if the survey could be repeated with the same value, while an unstable estimate wou ld be just the opposite; it would vary . Based on this, the wider the confidence interval, the less stable it is. As you can guess, the more narrow the interval, the more stable it is. The most common confidence intervals include 90%, 95%, and 99%, but in theory the confidence interval can fall anywhere between 0% to 100%. Calculation of Confidence Interval Given the following data, calculate the confi dence intervals of the sample .  The mean difference in the sample ( n = 100) is –12.7 .  There is a standard deviation of 8.9. PUH 5302, Applied Biostatistics 3 UNIT x STUDY GUIDE Title The following formula will be used: (,)= ± √ = −12 .7 ± 1.96 8.9 √100 = –12.7 ± 1.74 = (–14.1, –10.7) Standard deviation shows the degree of spread of the scores or how dispersed the scores are from the mean. It is normally calculated as the square root of the variance. In a normal distribution, the standard deviations show measur ements that fall within one, two , and three standard deviations of the mean. Margin of error (ME) shows by how much the results may differ from the actual population value. For example , at a 95% confidence interval, with 2% error shows the result statistics may fall with 2% points of the actual population value 95% of the time. Statisticians have come up with different ways to calculate margin of error. For example:  ME = critical value x standard deviation  ME = critical value x standard error of the statistic  ME = 1 / √N at 95% confidence interval. Using the formula ME = 1 / √N at confidence interval 95%, for a sample of N = 100, the margin of error will be: ME = 1 / √100 ME = 1 / 10 ME = 0.01 = 1% To interpret this result, one would say, for a sample size of N = 100, at 95% confidence level, there would be 1% margin of error. The margin of error expresses the amount of random sampling error in a survey's results. It shows that ther e is likelihood that the result from a sample could be generalized to the entire population at 95% confidence level . Below is a chart showing margins of error with different sample sizes. As you can see, the larger the sample is, the smaller the margin of error. PUH 5302, Applied Biostatistics 4 UNIT x STUDY GUIDE Title Importance of Quality Sample Size Selection Researchers have been cognizant of the importance of sample sizes in research studies. Using the appropriate sample size has many benefits .  Usually, researchers prefer to use larger sample sizes because large sample sizes provide the opportunity to a have a better representation of the general population. A smaller sample size may not be reflective of the general population and may result in l arger margin of error (as discussed earlier ).  A large sample size may help resolve the problem of outliers. Outliers create many problems in data analysis. Many researchers have resolved the problem of outliers by using large sample sizes .  Smaller sample sizes may lead to a waste of resources because , if results of a study do not reflect the general population, the study is of no use. On the other hand, an oversized sample may lead to the use of more resources and may put undue stress on the researcher.  There are ethical reasons associated with the use of appropriate sample sizes, especially with the use of human subjects.  Qualitative studies usually use smaller samples. However, the use of an appropriate large sample size may broaden the ran ge of the data and provide a better picture for analysis (Unite for Sight, n.d.). Strategies to Obtain a Quality Sample There are several ways to help ensure that a quality sample is obtained. The study must be focused and narrowed down to a manageable s ize with appropriate questions and responses. Your sample must be representative of the general population. Determine the appropriate inclusion and exclusion criteria for the study population. In addition, you must have a recruitment strategy that will all ow you to get quality participants for the study. Then, identify and recruit potential participants, being respectful and responsive to their concerns. To avoid selection bias, use random sampling and then follow up with the participants (Unite for Sight, n.d.). F inally , allow for flexibility in the process. Change the plan if the previous plan is not getting you more participants . Sample Size Determination We briefly touched on sample size calculation in the previous lesson. There are various methods of sample size determination , some of which were previously mentioned in Unit III. Let’s explore them in detail here. Statistical software: There are a plethora of sample size calculators freely available for public use on the Internet to calculate sample sizes; just search for “sample size calculator” on your favorite search engine. The software makes it easy to calculate sample sizes, confid ence intervals, and other functions. These sample (Zieben007, 2014) PUH 5302, Applied Biostatistics 5 UNIT x STUDY GUIDE Title size calculators save researchers from tedious calculations and help to save time. The Survey Monkey sample size calculator, like others found on the Internet is free for public use. The G*Power analysis pr ogram is a free software that may be used to calculate a variety of statistical tests including analysis of variance, F - test, t -test, and so on. In order to calculate power, the user must know four of five variables:  number of groups,  number of observatio ns,  effect size,  significance level, or  power (1 - β). Generally, regardless of the type of method used, sample size calculation results do not vary much from one another. Pre -calculated sample size tables: There are standard sample sizes with their associated population sizes, confidence intervals, and margin of errors that have been pre -calculated and developed over the years for public use. These tables are readily available when you search the Internet. I n most cases, researchers are free to use these tables instead of going through the burden of calculating sample sizes. In-depth interviews : For in -depth qualitative studies, it has been determined that 20 -30 interviews are needed to find about 95% of par ticipants. Some researchers deter mined that a sample size of 30 respondents would provide a starting point (Griffin & Hauser, 1993). Other researchers have also reported that for interviews, a sample size larger than 30 and less than 500 is appropriate for most research (Sullivan, 2018). Using margin of error: It takes quite a bit of calculation to determine an exact sample size for a study, but for most studies, an acceptable sample size can be determined using the calculated margin of error (Unite for Sight, n.d.) An estimation of margin of error is normally at 95% confidence level. That is, there is only a 5% chance that the sample results differ from the true population and is given by the formula below (Sullivan, 2018). Margin of error = 1 / √N, where N is the number of participants or sample size. This means that a sample size of 20 — the margin of error = (1 / √20 = 0.223) — would have a 22.3% margin of error . That being said , if researchers survey 20 people and find that 10 respondents are infec ted with a disease, this means that there is a 95% chance that between 7.77 (10 – 2.23) and 12.23 (10 + 2.23) of the participants are actually infected. Since the range is so large, the data are not very conclusive. However, if the researchers survey 200 p eople, the margin of error falls to 15%. Now, if 80 participants are infected, there is a 95% chance that between 65 (80 – 15) and 90 (80 + 10) of the population are actually infected. The larger the value of N is, the more useful the results because the m argin of error is smaller (Unite for Sight, n.d.). In addition , when considering sample size, rate of response must be considered as well as incomplete responses or responses that cannot be read. All of these things affect sample size. Using z -score: Befo re you can calculate a sample size, you need to know the following: 1. population size, 2. margin of error (confidence interval), 3. confidence level , and 4. standard of deviation (Smith , 2018). The confidence level corresponds to a z -score —a constant value needed for this equation. These constants are usually available in pre -calculated z-score tables . T he most common confidence levels are included below (Smith, 2018) . PUH 5302, Applied Biostatistics 6 UNIT x STUDY GUIDE Title  90% z -score = 1.645  95% z-score = 1.96  99% z -score = 2.576 The confidence levels are matched with their corresponding z-scores on the table. The next step is to plug in your z -score, standard deviation, and confidence interval into this equation (Smith, 2018) . Sample size = (z -score) 2 x StdDev x (1 - StdDev) / (margin of error) 2 Now, assuming we have the following data, how many participants are needed?  Confidence level 95%  0.4 standard deviation  Margin of error (confidence interval) of ± 5% Sample size = ((1.96) 2 x .4(.4)) / (.05) 2 = (3.8416 x .16) / .0025 = .614656 / .0025 = 246 participants are needed In summary, t he use of the appropriate sample sizes in research studies cannot be overemphasized. Aside from the fact that the appropriate sample size may increase the gener alization of the research results, it helps to reduce outliers and save time and resources as well. There are various ways to calculate sample sizes including using formulas and online software. References Griffin, A., & Hauser, J. R. (1993). The voice of the customer. Marketing Science, 12 (1). Retrieved from http://www.mit.edu/~hauser/Papers/TheVoiceoftheCustomer.pdf Smith, S. (2018, January 31). Determining sample size: How to ensure you get the correct s ample size [Blog post]. Retrieved from https://www.qualtrics.com/blog/determining -sample -size/ Sullivan, L. M. (2018). Essentials of biostatistics in public health (3rd ed.). Burlington, MA: Jones & Bartlett Learning . Unite for Sight. (n.d.). The importa nce of quality sample size. Retrieved from http://www.uniteforsight.org/global -health -university/importance -of-quality -sample -size Zieben007. (2014). Margin of error [Image]. Retrieved from https://commons.wikimedia.org/wiki/File:Margin - of-error -95.svg