Assume that you are a public health research officer investigating a disease outbreak, and your supervisor is recommending that you use probability sampling. Would you agree with your supervisor's rec

PUH 5302, Applied Biostatistics 1 Cou rse Learning Outcomes for Unit III Upon completion of this unit, students should be able to: 4. Recommend solutions to public health problems using biostatistical methods. 4.1 Compute and interpret probability for biostatistical analysis. 4.2 Draw conclusions about public health problems based on biostatistical methods. 5. Analyze public health information to interpret results of biostatistical analysis. 5.1 Analyze literature related to biostatistical analysis in the public health field. 5.2 Prepare an annotated bibliography that explores a topic related to public health issues. Course/Unit Learning Outcomes Learning Activity 4.1 Unit Lesson Chapter 5 Unit III Problem Solving 4.2 Unit Lesson Chapter 5 Unit III Problem Solving 5.1 Chapter 5 Unit III Annotated Bibliography 5.2 Chapter 5 Unit III Annotated Bibliography Reading Assignment Chapter 5 : The Role of Probability Unit Lesson Welcome to Unit III. In previous units, we discussed some fundamentals of biostatistics and their application to solving public health problems. In Unit III, we will compute, interpret, and apply probability, especially in relation to different populations. Computing and Interpreting Probabilities Probability means using a number (or numbers) to demonstrate how likely something is to occur. For example, if a coin is tossed, the probability of getting a head s or tail is one out of two chances; that is ½. Researchers have used probability studies to predict weather and other events and have been successful to some extent. Public health professionals have used statistical methods to predict the chances of health - related events, thereby providing arguments in favor of taking precautionary measures and warning the general public on important health issues. In biostatistics, we use both descriptive statistics and inferential statistics to address public health issues with in a population. In most cases, researchers are not able to study the entire population; they try to get a sample from the population from which they can generalize their findings. Descriptive Statistics Aside from the use of probability sampling methods, there are other methods used for the computation and interpretation of data; these are generally known as descriptive statistics. With descriptive statistics, we UNIT III STUDY GUIDE Probability PUH 5302, Applied Biostatistics 2 UNIT x STUDY GUIDE Title normally compute the mean, mode, median, variance, and standard deviation. Information obtained using such computation methods is used for descriptiv e purposes, as opposed to information obtained from inferential statistics. Let’s examine this example using the numbers 5, 10, 2, 4, 6, 10, 2, 3, and 2 . The mean is the sum of all the numbers ÷ the number of cases = 37 ÷ 9 = 4.11 The median is the middle number after the numbers have been arranged in an ascending or descending order = 4 The mode is the most frequently occurring number = 2 You can calculate the variance and standard deviation from the definitions given here:  Variance is the departure from the mean ( ±) or the a verage of the squared differences of the mean.  Standard deviation is the √variance ( the square root of the variance). Population and Sample A sample is just a smaller subgroup of a larger population. Researchers can take a sample from the population for a specific study. It is also possible to get different samples from a population. The most important thing here is that the sample characteris tics must reflect the population because the results of the findings are generalized to the entire population. In this case, the sampling method is very important. There are different types of sampling methods, including:  simple random sampling : every item has equal and independent chances of been selected;  systematic sampling : every n th item is selected in a given number of sample;  stratified sampling : the population is divided into subgroups and simple random sampling is used on each group;  cluster sa mpling : a population is divided into several areas and random areas are selected to asses;  quota sampling : the number of people to sample is chosen and then any method is used to sample the desired sample size;  selective sampling : a selected set of people is selected ;  convenience sampling : available research subjects are used ;  snowball sampling : recommend ations from subjects meeting similar research characteristics are used ; and  the oretical sampling : sampling is for the purpose of testing a the ory (Changing Minds, n.d.). Computing and interpreting probability for biostatistical analysis in health -related issues such as disease prevalence and incidence in a population is vital. The method of data collected and the type of analytical methods used are critical. Public health researchers may use two sampling methods: Probability sampling is the sampling method where every unit in the population has a chance of being selected in the sam ple (Sullivan, 2018). Probability sampling is widely used in qua ntitative sampling, making it possible for the selected sample to be unbiased. Unbiased reports from public health researchers are important for the veracity of the research studies. See the example below: PUH 5302, Applied Biostatistics 3 UNIT x STUDY GUIDE Title Assume that out of a population of 1500 persons, 1000 are male, and 500 are females. From those samples, 500 males and 300 females were also exposed to HIV. Representing the data in probability terms may help significantly for comparison and reporting purposes. This information could be reported thus:  P(males) = 1000 / 1500 = 0.66  P (females) = 500 / 1500 = 0.33  P(males exposed to HIV) = 500 / 1500 = 0.33  P(females exposed to HIV) = 300 / 1500 = 0.2  P(persons exposed to HIV) = 800 / 1500 = 0.53 Non -probability sampling is any sampling method where some of the population has no chance of selection (Sullivan, 2018). It is widely used in qualitative sampling. The use of nonprobability sampling makes it possible for public health researchers to insert descriptive comments regarding a sample , so they are cost effective and less time consuming. The n on -probability sampling method makes it possible to still conduct sampling where it is impractical for probability sampling to be done. There are also some disadvantages associated with this sampling method . There a re possibilities of lack of representation of some aspects of the population. In addition, generalization of results may be of low level and possible bias may be difficult to identify. This has a big implication for public health researchers who may want t o report accurate results or findings. At any rate, such reports have also help ed in providing vital information regarding disease and other aliments in the country. Proposing Solutions for Biostatistical Problems In proposing solutions for biostatistical problems, public health researchers, in addition to methods discussed above, use sensitivity and specificity testing, especially for investigations involving the presence of a diseases (Sullivan, 2018). L et’ s defin e these terms and examine some applications. Sensitivity and Specificity Sensitivity and specificity tests are used in screening in order to help identify individuals who have contracted a specific disease fo r which the screening is done. Sensitivity is the likelihood that a test will show the presence of a disease, while specificity testing looks for the likelihood that a disease is absent. In order to understanding this concept, let’s examine the table below . Screening Disease No Disease Total Positi ve A = 25 B = 50 A + B = 75 Negative C = 5 D = 120 C + D = 125 Total A + C = 30 B + D = 170 N = 200 Using the table above, disease presen ce is evaluated by  Sensitivity = P(Screen positive | Disease) = A / (A + C) x 100  Specificity = P(Screen negative | Disease free) = D / (B + D) x 100 Disease absen ce is evaluated by  P(Screen positive | Disease free) = B / (B+C) x 100  P(Screen negative | Disease) = C / (A + C) x 100 Let’ s see how these measures are applied in public health situations. Assume that 200 people are tested for a particular disease; 30 people tested positive for the disease, and 170 people do not have the disease. Calculate (the answers are below) . 1. Prevalence 2. Sensitivity 3. Specificity PUH 5302, Applied Biostatistics 4 UNIT x STUDY GUIDE Title 4. Positive predictive value 5. Negative predictive value Answers: 1. Prevalence Total disease / Total = 30 / 200 x 100 = 15% 2. Sensitivity A / (A + C) x 100 25 / 30 x 100 = 83.3% 3. Specificity D / (D + B) x 100 120 / 170 x 100 = 70.58% 4. Positive Predictive Value P(Disease | Screen Positive) = Positive for Disease (A) / Total Positive (A + B) 25 / 75 x 100 = 33.33 % 5. Negative Predictive Value P(Disease Free | Screen Negative) = Negative for No Disease (D) / Total Negative (C = D) 120 / 125 x 100 = 96% Using this method, public health officials are able to compare and contrast and report diseases and other public health issues effectively within a population. In another example, a screening test for Down Syndrome was conducted which yielded the following results: Screening Test Result Affected Fetus Unaffected Fetus Total Positive 17 251 268 Negative 5 449 45 4 Total 22 700 722 Thus, the performance characteristics of the test are:  Sensitivity = P (Screen Positive | Affected Fetus) = 17 / 22 =0. 772  Specificity = P (Screen Negative | Unaffected Fetus) = 449 / 700 =0. 641  False Positive Fraction = P (Screen Positive | Unaffected Fetus) = 251 / 700 = 0. 359  False Negative Fraction = P (Screen Negative | Affected Fetus) = 5 / 22 = 0. 227 Interpretation of results :  If a woman is carrying an affected fetus, there is a 77.2% probability that the screening test will be positive.  If the woman is carrying an u naffected fetus, there is a 64.1 % probability that the screening test will be negative. However, the fa lse positive and false negative fractions are a concern of this test.  If a woman is carrying an u naffected fetus, there is a 35.9 % probability that the screening test will be positive.  In addition, if the woman is carrying an affected fetus there is a 22.7 % probability that the test will be negative. Sample Size Determination In this section, we will only briefly discuss sample size and sample size determination because there is a whole lesson on this further in the course. For any research involving subjects (participants) within a PUH 5302, Applied Biostatistics 5 UNIT x STUDY GUIDE Title population, it is important to get the appropriate sample sizes because sample sizes that are too small or too large are detrimental to the study. The wrong sample size may yield poor results, creating validity problems. In order to overcome the problem of sample size in a research, experts have come up with various methods of sample size determination. Some of these include: 1. Various free sample size calculators on the Internet —if you search “sample size calculator” on your favorite search engine, there are many readily available. 2. Survey Monkey provides a sample size calculator on its website (SurveyMonkey, n.d.). 3. National Statistical Service’s Sample Siz e Calculator is a website. 4. Manual calculations are another option. B efore using calculations, we need to know few things, namely: a. population size , b. margin of error or c onfidence interval (this determines how much higher or lower than the population mean the sample mean can fall, commonly a margin of error of ± 5% ), c. confidence level (the most common confidence interval used is 95% confident ), and d. standard deviation (m any researchers safely use .5 ). We will also need the z -score and the standard values used for the most common confidence levels .  90% z -score = 1.645  95% z -score = 1.96  99% z -score = 2.576 The next step is to insert the z-score, standard deviation, and confidence interval into this equation: Necessary sample size = (z -score) 2 x StdDev x (1 - StdDev) / (margin of error) 2 Let’ s see an example here assuming a 95% confidence level, .5 standard deviation, and a margin of error of ± 5%. Sample size = ( (1.96) 2 x .5(.5) ) / (.05) 2 = (3.8416 x .25) / .0025 = .9604 / .0025 = 384.16 Subjects needed = 385 (Sample size) The number of subjects needed will vary if a different z-score, confidence interval , or margin of error is used. Note: This equation is used when we do not know the population size. In summary, this lesson introduced you to the importance of probab ility and how it can be used in the public health field. There are many different types of sampling methods, and it is important to be familiar with each one. In addition, it is also crucial to be able to effectively determine a sufficient sample size for a study. This topic will be explored more in depth in a later unit. References Changing Minds . (n.d.). Cho osing a sampling method. Retrieved from http://changingminds.org/explanations/research/sampling/choosing_sampling.htm Sullivan, L. M. (2018). Esse ntials of biostatistics in public health (3rd ed.). Burlington, MA: Jones & Bartlett Learning . Survey Monkey . (n.d.). Sample size calculator. Retrieved from https://www.surveymonkey.com/mp/sample - size -calculator/ PUH 5302, Applied Biostatistics 6 UNIT x STUDY GUIDE Title Learning Activities (Nong raded) Nongraded Learning Activities are provided to aid students in their course of study. You do not have to submit them. If you have questions, contact your instructor for further guidance and information. For extra practice with probability concepts, c omple te the following Chapter 5 practice problems on pages 99 – 100 in your textbook : 12, 13, 14, 15, and 18. Be sure to s how all of your work.