Research Methods Literature ReviewPrior to beginning work on this assignment, review the qualitative and quantitative research designs encountered so far in this course.For your literature review, you

RESEARCH INNOVATIONS AND RECOMMENDATIONS Optimization. New York, NY: John Wiley & Sons Inc; 2000.

9. Fisher B, Costantino JP, Wickerham DL, et al. Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-1 Study. J Nati Cancer Inst.

1998,90:1371-1388.

10, O'Connor AM, Fiset V, Rostom A, et al.

Dedsion aids for people fadng health treatment or screening decisions. Cochmne Database Syst Rev.

2001,3:CD001431.

11.

O'Connor AM, Rostom A, Fiset V, et al.

Dedsion aids for patients facing health treatment or screening decisions: system- atic review. BA//. 1999;319:731-734.

12.

Shiffman S, Gitchell J. World's best practice in tobacco control: increasing quitting by increasing access to treat- ment medications: USA. Tob Control.

2000;9:228-236.

13.

Shiffman S, Paty JA, Rohay JM, DiMadno ME, Gitchell JG. The efEcacy of computer tailored smoking cessation material as a supplement to nicotine patch therapy. Drug Alcohol Depend.

2001;64:35-46.

14.

Etter JF, Using new information technology to treat tobacco dependence.

Respiration. 2002;69:111-114.

15.

Strecher VJ. Computer-tailored smoking cessation materials: a review and discussion. Patient Educ Couns.

1999;36:107-117.

16.

West SG, Aiken LS, Todd M. Prob- ing the effects of individual components in multiple component prevention pro- grams, AmJ Community Psychol.

1993; 21:571-605.

17 West SG, Aiken LS.

Toward under- standing individual effects in multicompo- nent prevention programs: design and analysis strategies. In: Biyant KJ, Windle M, West SG, eds.

77K Science of Prévention:

Methodohgical Advances From Alcokd and Substance Use Research.

Washington, DC:

American Psychological Assodation; 1997.

Alternatives to the Randomized Controlled Trial Public health researchers are addressing new research questions (e,g,, effects of en- vironmental tobacco smoke.

Hurricane Katrina) for which the randomized controlled trial (RCT) may not be a feasi- ble option.

Drawing on the potential outcomes framework (Rubin Causal Model) and Campbel- lian perspectives, we consider alternative research designs that permit relatively strong causal inferences. In random- ized encouragement designs, participants are randomly in- vited to participate in one of the treatment conditions, but are allowed to decide whether to receive treatment.

In quantitative assignment designs, treatment is assigned on the basis of a quantitative measure (e.g., need, merit, risk).

In observational studies, treatment assignment is un- known and presumed to be nonrandom. Major threats to the validity of each design and statistical strategies for miti- gating those threats are pre- sented.

{Am J Public Health, 2008;98:1359-1366, doi:10.

2105/AJPH.2007,124446) Stephen G. West, PhD, Naihua Duan, PhD, Willo Pequegnat, PhD, Paul Gaist, PhD, MPH, Don C, Des Jaríais, PhD, David Holtgrave, PhD, José Szapocznik, PhD, Martin Fishbein, PhD, Bruce Rapkin, PhD, Michael Clatts, PhD, and Patricia Dolan Mullen, DrPH Far better an approximate answer to the right question, which is often vague, than an exact answer to the imong question, which can always be made precise.

—John Tukey' THE RANDOMIZED CONTROLLED trial (RCT) has long been the gold standard for clinical re- search, representing the best way to determine efficacy and effectiveness for many interven- tion and prevention programs.

However, public health research- ers are increasingly addressing questions for which the RCT may not be a practical (or ethi- cal) option or for which the RCT can be complemented by alter- native designs that enhance gen- eralization to participants and contexts of interest.

When structural or policy in- terventions are being examined, practical problems in conducting RCTs may arise^—for example, re- search partidpants may not want to be randomized, randomization may not be feasible or not ac- cepted in the research context or only atypical partidpants may be willing to be randomized. Such problems might be a concern in studies of the effects of environ- mental tobacco smoke on non- smokers or of the effects of the severe disruption of Gulf Coast communities by Hurricane Kat- rina on HIV risk behaviors and medical care. Only atypical par- tidpants may agree to partidpate in the evaluation of a faith-based intervention. Highly religious participants may refuse to be assigned to a non-faith-based treatment group, whereas non- religious participants may refuse or be unable to partidpate sin- cerely in a faith-based group.

Randomization may be precluded if the religious organization im- plementing the intervention strongly believes that all people desiring the faith-based interven- tion should receive it. With one exception, our focus is on designs in which partidpants are assigned to treatment or control conditions.

Parallel designs exist in which set- tings, time, or even dependent mea- sures are the unit of assignment,^ THE RANDOMIZED CONTROLLED TRIAL The RCT has its origins in the pioneering work of the English sd- entist and statistidan Sir Ronald Fisher"* in agriculture during the 1920s and 1930s, Fisher's key in- sight was that random assignment of units to treatment conditions leads to 2 related expectations:

(1) the mean level for each of the treatment conditions is equal, on average, on any conceivable par- tidpant background variable prior to the beginning of the experi- ment; and (2) treatment assign- ment is, on average, unrelated to any conceivable partidpant back- ground variable.

In the context of Fisher's agri- cultural studies, these expectations August 2008, Vol 98, No, 8 | American Journal of Public Health West et al. | Peer Reviewed | Research Innovations and Recommendations I 1359 RESEARCH INNOVATIONS AND RECOMMENDATIONS guaranteed that the design would provide an unbiased estimate of the true causal effect. However, other features of the public health context require addi- tional assumptions when tradi- tional RCTs are utilized.' Unlike the com plants in Fisher's agri- cultural studies, people can seek out altemative treatments or re- fuse treatments (nonadherence to treatment). People can refuse to be measured or migrate to an- other locale (attrition). Important advances addressing the chal- lenges of nonadherence, attri- tion, and their combination have • been made during the last half century. Advances in altemative designs and statistical analyses have also occurred.^"" Two per- spectives have guided this work.

TWO PERSPECTIVES ON STRENGTHENING CAUSAL INFERENCE Potentiai Outcomes .

Perspective The potential outcomes per- spective was originally introduced by Neyman*^ and developed by Rubin et al.'*^*^ It takes as its starting point a comparison of an individual unit's outcome when the treatment is applied, Y((u), versus the same unit's outcome when the altemative (or control) treatment is applied, Y^(u).

The causal effect is defined as where Y((u) represents the re- sponse of unit u to treatment t, and Y^(u) represents the response of the same unit u to the control treatment c at the identical time and in the identical setting. Theo- retically, comparison of these 2 outcomes provides the ideal de- sign for caus£il inference. Unfortu- nately, this ideal can never be achieved in practice. Additional assumptions are required depend- ing on the choice of altemative to the ideal design. For the RCT, the additional assumptions required are (1) the units are independent, (2) partidpants actually received the treatment as intended (e.g., complete treatment adherence), (3) attrition fh)m posttest mea- surement did not occur, and (4) the existence of other treat- ment conditions did not affect the participant's outcome.^ If these as- sumptions of the RCT are met, strong inferences can be drawn about the average causal effect of treatment t relative to treatment c on the outcome. However, these assumptions are often not met.

For example, in RCTs of mam- mography screening, one third of participants in the treatment group have refused screening and many partidpants in the control group have obtained screening outside the trial.'' Campbelllan Perspective Campbell et al. have devel- oped a practical theory of causal inference that follows the logic and strategies of the working scientist.'^'^ Researchers need to identify plausible threats to the validity of the causal inference based on design considerations and prior empirical research.

Then they need to rule out the possibility that any of those threats are responsible for the observed effect. If the initially proposed design does not rule out important plausible threats to causal inference, enhance- ments to the design are intro- duced that address the identified threats. Through a process of continued critical evaluation and additional research, plausible threats to validity can be identi- fied and eliminated, yielding im- proved estimates of the causal effect.

Although Campbell et al. dis- cussed 4 types of threats to valid- ity, space limitations restrict our discussion to 2 types. Threats to internal validity are confounding factors that may potentially pro- duce the observed results. These threats include factors that may lead to changes between baseline and posttest (e.g., differential his- tory, maturation) and factors that may lead to differences between the treatment and control groups (e.g., differential selection, differ- ential attrition) in the absence of a treatment effect. Threats to ex- ternal validity limit the potential generalization of the results, an important consideration given the increasing emphasis on the translation of research results in public health into practice.

ALTERNATIVE DESIGNS FOR STRENGTHENING CAUSAL INFERENCE Randomized Encouragement Designs Trial partidpants eire expected to adhere to their treatment as- signments in classic RCTs. They may be given strong incentives that are outside usual practice to ensure adherence with the full protocol. Alternatively, partid- pants may be randomly assigned to an opportunity or an encour- agement to receive a specific treatment, but allowed to choose whether to receive the treatment.

This variation fi-om the classic RCT model is useful for interven- tions for which it is impractical or unethical to require adherence or in which the necessary incentives would be unrealistic, thus pre- duding generalization to practice.

For example, this design was used by Vinokur et al.'^ to study the impact of a job seeking skills program OOBS) on depression in partidpants. This study recruited eligible partidpants (e.g., laid off and seeking a new job) at unem- ployment offices. All partici- pants received a brief booklet describing job search methods.

Partidpants were randomly as- signed (stratified by baseline risk) to receive or not receive an invi- tation to partidpate in the JOBS program, a 20-hour group train- ing program that emphasized learning and practicing job seeking skills, inoculation against setbacks, and sodal sup- port Of invited partidpants, 54% attended the program. At- tempts were made to measure all participants on depression 6 months after baseline measure- ment (87% response rate).

Intention to treat analyses c£in be applied to randomized en- couragement designs to assess the impact of treatment assign- ment (the offer of or encourage- ment to partidpate in the pro- gram) on partidpant outcome (depression). To the extent that missing data are negligible, the estimated effects are unbiased.

Under the assumption of the ex- clusion restriction (the impact of treatment assignment is medi- ated entirely through the receipt of treatment), instrumental vari- ables einalysis^ provides an unbi- ased estimate of the more in- formative complier average causal effect—the effect of the re- ceipt of treatment QOBS atten- dance) averaged across adherers who are expected to adopt the treatment if assigned to the treat- ment group. Littie and Yau* com- pared the subgroup of partid- pants who adhered to treatment in the JOBS program with the subgroup of individuals in the control group who would be ex- pected to adhere to the treatment if invited to partidpate in the JOBS program. The JOBS pro- gram led to decreased depression 1360 I Research Innovations and Recommendations | Peer Reviewed | West et al.

American Journai of Public Heaith | August 2008, Voi 98, No. 8 RESEARCH INNOVATIONS ANO RECOMMENOATIONS for high-risk participants who would adhere to treatment The combination of randomization and the assumption of the exclu- sion restriction provided a strong basis for the unbiased estimate of the average effect of the JOBS program and proper standard er- rors for treatment adherers in a community population. More- complete discussions of random- ized encouragement designs are Nonrandom Quantitative Assignment of Treatment In quantitative assignment de- signs, párüdpeints are assigned to treatment groups on the basis of a quantitative measure, often a measure of need, merit, or j.jg]^ 17,21-24 -pQj.

example, school lunch programs in the United States are assigned to children whose household income falls below a prespecified threshold re- lated to need (e.g., poverty line).

Causal inference is based on mod- eling the functional relationship between the known quantitative assignment variable (household income) and the outcome variable (e.g., health, school achievement), estimated separately for the treated group that falls below the threshold and the control group that falls above the threshold.

Because the assignment variable Rally determines treatment assign- ment, proper adjustment for the assignment variable permits the inference of a treatment effect for the school lunch program if there is a discontinuity at the threshold where the treatment is introduced (Figure 1).

As part of the launch of the Head Start program in 1965, US counties with a poverty rate above 59.2% (the 300 poorest in the country) received technical assistance in writing Head Start proposals. A very high proportion (80%) of the poorest counties received Head Start funding, ap- proximately double the funding rate of counties that were slightly better off economically (49.2%-59.2% poverty rates) that did not receive technical as- sistance. The original Head Stiirt program provided basic health services (e.g., nutrition, immu- nization, screening) to children in addition to its educationed component. Using a regression discontinuity design, Ludwig and Miller^^ found results that demonstrated the introduction of Head Start had led to sub- stantially lower mortality rates in children aged 5 to 9 yeeirs from diseases addressed by the program (e.g., measles, anemia, diabetes).

Quantitative assignment de- signs can be applied to units at various levels such as individuals.

High-, Si o (U community health clinics, neigh- borhoods, or counties. These de- signs offer an important alterna- tive to classic RCTs in situations in which randomization is im- practical or unethical. Many im- portant public health communi- ties might be resistant to RCTs, in which case quantitative assign- ment designs might be more ac- ceptable. In addition, the RCT might be unethical when there are doubts about equipoise.

Quantitative assignment designs that utilize a clinically meaning- ful assignment variable (e.g., risk level) might provide a stronger ethical basis for such studies.

Quantitative assignment designs can also be implemented based on time (interrupted time series) or settings (e.g., measured risk of neighborhoods).'' For example, Khuder et al.^^ analyzed 6 years of monthly data on hospital admissions in 2 cities for coronary heart disease and for other diseases unrelated to smoking. In one city, a public ban on indoor smoking was im- plemented after the third year of data collection. Khuder et al.

showed that hospital admissions for coronary heart disease (but not other diseases) declined fol- lowing the introduction of the smoking ban. By contrast, no cheuige in hospital admissions for either coronary heart disease or other diseases was observed in the comparison city, which did not institute a smoking ban.

Any alternative explanation of these results must clearly ac- count for why the change oc- curred at the point at which the smoking ban was introduced only in the treatment city and only on the outcome related to smoking.'^ Low 20000 40000 60000 Family Income, $ 80000 100000 Note.

All children whose family income was below the threshold, here $20000 (dotted line), received the treatment program (school lunch program); all children whose family income was above the threshold did not receive the program.

The difference in level between the regression lines for the program and no program groups at the threshold represents the treatment effect.

FiGURE 1-liiustration of regression discontinuity design.

August 2008, Vol 98, No. 8 | American Journal of Public Health West et al. | Peer Reviewed | Research Innovations and Recommendations | 1361 RESEARCH INNOVATIONS AND RECOMMENDATIONS One primary weakness of quantitative assignment designs is that the functional form of the relationship between the assign- ment variable and response vari- able is usually unknown. With unknown functional foirns, bias might be introduced if the func- tional relationship is misspecified (e.g., assuming a linear func- tional form when the true func- tional form is curvilinear). In smaller samples, separate non- parametric smooths (e.g., lowess) of the data for the partidpants who receive the treatment and control conditions provide some information about functional form. In large samples, this bias can be minimized by using non- parametric regression methods to estimate the functional rela- tionship. In the regression dis- continuity design, the assignment threshold can sometimes be modified in subsequent studies (e.g., some states have different incomes necessary for treatment receipt) to further strengthen causal inference. In the time se- ries design, cities that introduce the intervention at different time points can be compared. "'^^ Observational Studies In observational studies, par- ticipants in preexisting or con- structed groups receive various treatment conditions, often through voluntary selection.^*'^^ The selection of participants into each treatment condition may be associated with confounding factors, resulting in bias that might occur in naive statistical analyses. However, advances in methodology have provided a much stronger toolkit for obser- vational studies. We discuss 2 general approaches below.

First, within the potential out- comes perspective, an important focus has been on the development of matched sampling strategies and analyses.^""" Among the most developed strategies are causal inference methods based on propensity scores.^^'^^ Propen- sity scores represent the predicted probability that a participant will receive the treatment given his or her baseline measurements, esti- mated using either logistic regres- sion to predict treatment status, or more-complex functional fonris such as regression tree models.

If the researcher can accurately construct propensity scores that balance the treatment and control partidpants on all potentially rele- vant baseline variables, the differ- ence between the response in the treatment condition and the con- trol condition (conditioned on the propensity scores) will represent the causal effect In essence, con- ditioning on the basis of the propensity scores attempts to cre- ate homogeneous units whose re- sponses in the treatment and con- trol groups can be directly compared.

The propensity scores method can only mitigate overt selection bias attributed to those baseline characteristics that have been ac- curately measured.^^ The ade- quacy of the comparison depends strongly on baseline assessment of the full set of variables believed to be potentiaUy related to treatment selection and outcome. Assess- ment of a few convenient baseline variables (e.g., demographics) is unlikely to substantially mitigate selection bias.

Haviland et al.^** studied the effect of gang membership on violent delinquency, an impor- tant question for which an RCT was not feasible. They conducted a longitudinal study of boys liv- ing in lower sodoeconomic areas of Montreal, Quebec, and identi- fied boys who were not members of any gang prior to age 14 years.

Based on the boys' behaviors between the ages of 11 and 13 years, they identified groups with a history of low violence, declin- ing violence, and chronic high violence. Within each of these groups, they measured a large number of baseline covariates known to be related to gang membership and violence.

Propensity to join a gang at age 14 years was estimated sepa- rately within each violence his- tory group from the baseline co- variates, with the result that boys who did and did not join geings at age 14 years could be closely matched within both the low and declining violence groups, but not the chronic high violence group. This finding illustrates that the propensity scores method often appropriately limits general- ization of the causal effect by re- stricting comparisons to only the range of propensity scores within which adequate comparisons can be constructed. In the low and .

declining groups, joining a gang at age 14 years increased violent delinquent acts.

Haviland et al.

also performed a sensitivity analysis that investi- gated how large hidden bias would need to be before the treatment effect was no longer statistically significant. They found that even if hidden vari- ables existed that led to a 50% increase in the odds of joining a gang, a significant treatment ef- fect would still exist. Such causal sensitivity analysis against hidden bias can be used to bracket the plausible range of the magnitude of the causal effect^**'^^ Alterna- tively, hidden bias caused by un- observed confounding factors can sometimes be mitigated using in- strumental variables analysis.^^"*^ Second, within the Campbel- lian Iramework, design elements are added that address likely threats to internal validity. "'^^ These design elements include strategies such as matching and stratifying, use of pretests on multiple occasions to estimate preexisting trends, use of multi- ple control groups with differ- ent strengths and weaknesses to bracket the effect, and the use of nonequivalent dependent measures that are expected to be affected by the threat but not by the treatment (see also Rosenbaum^^'^^). Reynolds and West''" provide an illustration of the use of several of these strategies in an observational study designed to evaluate the effectiveness of a program to increase the sales of state lot- tery tickets in convenience stores.

The store managers re- fused randomization. Those stores that agreed to implement the program were matched with other stores in the same chain on basefine sales volume and • geographical location. Increases in sales were observed in ( 1 ) the treatment but not the con- trol group; (2) within the treat- ment group, for lottery ticket sales, but not other sales cate- gories; and (3) in the weeks fol- lowing the introduction of the intervention, but not before (Figure 2). Taken together, in- clusion of these additional de- sign elements made it extremely difficult to identify any poten- tial confounding factors that might be responsible for the observed pattern of results. In the Campbellian framework, strong priority is given to design enhancements over statistical corrections with their associated assumptions.^* CONCLUSION The RCT is the gold standard among research designs. It has 1362 I Research Innovations and Recommendations | Peer Reviewed 1 West et al.

American Journai of Pubiic Heaith | August 2008, Vol 98, No. 8 RESEARCH INNOVATIONS AND RECOMMENDATIONS the highest internal vatidity because it requires the fewest assumptions to attain unbi- ased estimates of treatment ef- fects.

Given identical sample sizes, the RCT also typically surpasses all other designs in terms of its statistical power to detect the predicted effect.

Nonetheless, even with the best planning, the RCT is not immune to problems common in community trials. These threats potentially weaken the causal inferences.

When RCTs cannot be imple- mented in settings or with partici- pants of interest, it is far better to use a strong alternative design than to change the treatment (e.g., using an analog rather than an actual faith-based treatment) or study population (e.g., using only participants indifferent to the treatment choice) so that an RCT may he implemented. Such changes may severely limit the external validity of the findings, potentially distorting the inference about the causal effect for the specific population, treatment, and setting of interest.

Even when RCTs can he implemented; alter- native designs can be valuable complements that broaden the generalizations of RCTs in multi- study programs of research.

The alternative design and statistical approaches permit rela- tively strong causal inference in the RCT when common problems such as treatment nonadherence and participant attrition occur and in alternative designs when randomization is not possible.

0.00055 a, 0.00050 ^ 0.00045 % 0.00040 0.00035 0.00030 10 11 Game Number J, no.

sSol( Ticket:

800 • 700 • 600 .

500 • 400 • 300 • 200 • Progran^ Started \.

Treatment ''"'^ .^_^ Control 7.0 - 6.0 - 5.0 - 4.0 - -1.0 - -1.1 - -1.2 • -1.3 - -1.4 - -1.5 - -1.6 • Gasoline Cigarettes Tickets Ties (tax) Pretest Posttest Design 2 12 3 4 12 3 4 Pretest Week Posttest Week A/ote.

In panel a, treatment and control stores were selected from the same chain, were in the same geographical location, and were comparable in sales during baseline (lottery game 10). Introduction of the treatment at the beginning of lotteiy game 11 yielded an increase in sales only in the treatment stores. In panel b, within the treatment stores, sales of lottery tickets increased substantially following the introduction of treatment. Sales of other major categories (gasoline, cigarettes, nontaxable groceries, and taxable groceries that would be expected to be affected by confounding factors, but not treatment) did not show appreciable change. In panel c, treatment and control stores' sales show comparable trends in saies during the 4 weeks prior to and 4 weeks following the introduction of the treatment.The level of sales in the treatment and control stores is similar prior to the introduction of treatment but differ substantially beginning immediately after treatment is introduced.

Source. Adapted from Reynolds and West.™ FIGURE 2-Design elements that strengthen causal inferences in observatlonai studies: matching (a), nonequivalent dependent variables (b), and repeated pre- and posttest measurement (c).

Researchers need to give carellil attention to the additional as- sumptions required by these ap- proaches. Table 1 lists each of the designs considered in this article.

The first section lists the basic assumptions and internal validity threats of the RCT, to- gether vnth design and statistical approaches for addressing these issues. Each subsequent section lists key assumptions and threats to internal vdidity in addition to those of the RCT, together with design and statistical approaches for addressing these issues.

To illustrate, the key additional threat in the regression disconti- nuity design is misspecification of the functional form of the rela- tionship between the assignment and outcome variables (typically assumed to be linear). Statisti- cally, nonparEimetric regression in large sarnples and sensitivity analyses in small samples that probe the extent of misspecifica- tion necessary to undermine the observed treatment effect can help bracket the possible range of the effect size. Adding the design feature of a nonequiv- alent dependent variable that is expected to be affected by im- portant confounders, but not by the treatment, can help rule out many of the threats to internal validity.

In general, the causal ef- fect estimated Irom the alterna- tive designs and analyses is likely to be associated with more un- certainty, than those from the ideal RCT in which no attrition or treatment nonadherence has occurred. Confidence intervals that provide a range of plausible effect sizes caused by sampling fiuctuations should be supple- mented with estimated brackets on effect sizes that indicate how large or small tiie effect might plausibly be if key assumptions are not met.^" Remaining August 2008, Vol 98, No.

8 | American Journal of Public Health West et al.

| Peer Reviewed | Research Innovations and Recommendations | 1363 RESEARCH INNOVATIONS AND RECOMMENDATIONS TABLE 1-Key Assumptions or Threats to internal Validity and Example Remedies for Randomized Controi Triais and Aiternatives Approaches to Mitigating the Threat to interval Vaiidity Assumption or Threat to internai Vaiidity Design Approach Statisticai Approach Randomized controiied experiment Independent units Fuii treatment adherence No attrition Other treatment conditions do not affect participant's outcome (SUM) Randomized encouragement design Exciusion restriction Regression discontinuity design Functional form of relationship between assignment variable and outcome is properly specified interrupted time series anaiysis Functionai form of the relationship for the time series is properiy specified; another historicai event, a change in popuiation (seiection), or a change in measures coincides with the introduction of the intervention.

Observational study Measured baseiine variabies equated; unmeasured baseiine variabies equated; differential maturation; baseline variabies reiiabiy measured Temporai or geographicai isoiation of units Incentives for adherence Sampie retention procedures Temporai or geographicai isoiation of treatment groups No design approach yet available Replication with different threshold; nonequivaient dependent variabie Muitiievel anaiysis (other statisticai adjustment for clustering) instrumentai variabie anaiysis (assume exciusion restriction) Missing data anaiysis (assume data missing at random) Statisticai adjustment for measured exposure to other treatments Sensitivity anaiysis Nonparametric regression; sensitivity anaiysis Nonequivaient controi series in which intervention is not introduced; Diagnostic piots (autocorrelogram; spectrai density); sensitivity switching repiication in which intervention is introduced at anaiysis another time point; nonequivaient dependent measure Muitipie controi groups; nonequivaient dependent measures; additional pre- and postintervention measurements Propensity score anaiysis; sensitivity anaiysis; subgroup anaiysis; correction for measurement error «oie, SUTVA=stabie unit treatment vaiue assumption,The iist of assumptions and threats to internai vaiidity identifies issues that commoniy occur in each of the designs.The aiternative designs may be subject to each of the issues iisted for the randomized controiied triai in addition to the issues iisted for the specific design,The examples of statisticai and design approaches for mitigating the threat to internai validity iilustrate some commoniy used approaches and are not exhaustive. For the observational study design, the potentiai outcomes and Campbeilian frameworks study differ so that the statistical and design approaches do not map 1-to-l onto the assumptions or threats to internai vaiidity that are listed.

More in-depth descriptions can be found in Shadish et ai," and West et al,^^ unceiiainty about the causal ef- fect can often be reduced by adding design features that help rule out the possibility that other unobserved confounders are pro- ducing the observed effect We have touched only briefly on the matter of external validity.

Generalization of findings should not be assumed; features to en- hance generalization need to be built into the design,'' Some RCTs have features that decrease the generalizabüity of their results to the actual treatments, settings, and populations of interest^^ This may limit the ability of public health research to provide informa- tion about the actual effectiveness of interventions to alleviate health problems. People can have prefer- ences and capacities that interact with treatment effects. Important contextual variables can influence intervention effects as well as par- tidpant self-selection and attrition.

Regardless of the design chosen, features that maximize external validity should be incorporated into the design, Shadish et al," present procedures for doing this in both single and multiple studies.

Our opening quotation from John Tukey reminds us that the public health significance of the research question should be para- mount in the design of research.

Important questions should not be ignored if they cannot be fit- ted into the framework of an RCT, Rather, the strongest possi- ble design that can feasibly be implemented should be chosen, whether an RCT or an alternative design. Whatever design is cho- sen, careful attention must be given to the viability of the as- sumptions of the design, adding design and analysis features to address plausible threats to inter- nal and external validity.

In addition, the evaluation of important interventions is rarely limited to single studies but rather is based on the accumu- lated body of research. The use of systematic reporting frame- works, such as CONSORT"' for RCTs and TREND"^ for non- randomized studies, may encour- age more in-depth appraisal of re- search designs both during the planning of the study and the evaluation of its results. Scientific progress in public health will be facilitated by asking the right questions, choosing the strongest feasible design that can answer those questions for the popula- tion of interest, and probing the assumptions underlying the de- sign and analysis choices through the addition of carefully chosen design features and supplemental statistical analyses, • 1364 I Research Innovations and Recommendations | Peer Reviewed | West et al.

American Journal of Public Health | August 2008, Vol 98, No, 8 RESEARCH INNOVATIONS ANO RECOMMENOATIONS About the Authors Stephen C.

West is with Arizona State University, Tempe.

Naihua Duan is with Columbia University, New York, NY, and New York State Psychiatric Institute, New York.

Willo Pequegnat is with the National Institute of Mental Health, Bethesda, MD.

Paul Gaist is with the National Institutes of Health, Bethesda.

Don C.

Des jaríais is with Beth Israel Medical Center, New York.

David Holtgrave is with the Johns Hopkins University, Bloomber:g School of Public Health, Baltimore, MD.José Szapocznik is ivith the University of Miami School of Medicine, Miami, FL Martin Fishbein is with the Annenberg School for Communi- cation, University of Pennsylvania, Phila- delphia.

Bruce Rapkin is with Memorial Sloan-Kettering Cancer Center, New York.

Michael Clatts is with the National Devel- opment and Research Institutes, Inc, New York.

Patriría Mullen is with the University of Texas School of Public Health, Houston.

Requests for reprints should be sent to Stephen G.

West, Psychology Department, Arizona State University, Tempe, AZ 85287-1104 (e-mail:

[email protected]).

Note.

The views in this article are those of the authors.

No official endorsement by the US Department of Health and Human Services or the US National Institutes of Health is intended or should be inferred.

Contributors S. G.

West partídpated in the initial workshop and helped develop the out- line, wrote the initial draft and subse- quent drafts of the article incorporating additions and edits, and wrote the final article. N. Duan participated in the ini- tial workshop, participated in the devel- opment of the paper outline, drafted part of the article and reviewed and edited the entire article. W. Pequegnat conceptualized the initial workshop on which the article is based, co-chaired the workshop and guided development of original outline, wrote the introduction for the first draft, provided feedback on multiple drafts, and coordinated contin- ued development of the article. P. Gaist participated in the original workshop, guided development of the original out- line, provided significant input and con- tributions throughout the planning, writing, review, and revision stages of this article. He has served as 1 of the 2 primaiy coordinators responsible for overseeing each phase that has been re- quired in the development and writing of this article.

D. G.

Des Jaríais chaired the initial workshop that led to the writ- ing of the article, contributed text to various drafts, edited and approved the final draft. D. Holtgrave, J. Szapocznik, M. Fishbein, B. Rapkin, M.G. Glatts, and P.D. Mullen attended the workshop, helped conceptualize ideas, contributed text, and reviewed and edited drafts.

Acknowledgments S. G.

West was supported by a study visit grant at the Free University of Berlin by the German Academic Exchange Service.

An earlier version of this article was presented to the meeting of the Gommit- tee on the Prevention of Mental Disor- ders and Substance Abuse among Ghildren, Youth, and Young Adults, In- stitute of Medicine, Washington, DG, October, 2007.

We thank Wei Wu for her help in the preparation of the figures.

Note. On November 14-15, 2005, the US National Institute of Mental Health and the Office of AIDS Researoh, US National Institutes of Health, con- vened a group of experts to consider the critical questions assodated with the ef- ficacy and effectiveness of interventions for preventing HIV and other chronic diseases that do not lend themselves to randomized controlled trials. This dis- cussion led to the development of this article.

Human Participant Protection No protocol approval was needed for this study.

References 1. Tukey JW. The future of data anal- ysis.

Ann Math Stat. 1962;33:13-14.

2.

Bonnell G, Hargreaves J, Strange V, Pronyk P, Porter J.

Should structural in- terventions be evaluated using RGTs?

The case of HIV prevention. Soc Sd Med 2006;63:I135-1142.

3.

Reichardt CS. The prindple of par- allelism in the design of studies to esti- mate treatment effects.

Psychol Methods.

2006;ll:l-18.

4.

Fisher RA.

The Design of Experi- ments.

Edinburgh, Scotland: Oliver & Boyd; 1935.

5.

Holland PW. Statistics and causal inference (with discussion).//4m Stat Assoc. 1986:81:945-970.

6. Angrist JD, Imbens GW, Rubin DB.

Identification of causal effects using instrumental variables (with discussion).

J Am Stat Assoc 1996,91:444-472.

7. Jo B. Statistical power in random- ized intervention studies with noncompli- ance.

Psychol Methods.

2002;7:178-193.

8. Little RJ, Yau L. Statistical tech- niques for analyzing data from preven- tion trials: treatment of no-shows using Rubin's causal model.

Psychol Methods.

1998;3:147-159.

9. Little RJA, Rubin DB.

Statistical Analysis with Missing Data.

2nd ed.

New York, NY:

John Wiley and Sons; 2002.

10.

Schäfer JL, Graham JW. Missing data: our view of the state of the art Psychol Methods.

2002;7:147-177 11.

West SG, Sagarin BJ. Partidpant selection and loss in randomized experi- ments. In: Bickman L, ed.

Research De- sign:

Donald Campbell's Legacy.

Vol. 2.

Thousand Oaks, GA: Sage Publications, 2000;117-154.

12.

Neyman J. On the application of probability theory to agriculture experi- ments. Essay on prindples. Section 9.

Statistical Science.

1990;5:465-472.

Originally published in Roczniki Nauk Rolniczych [Annals of Agricultural Sdence] 1923, Tom X, 1-51. Trans- lated and edited by DM Dabrowska and TP Speed.

13.

Rubin DB. Estimating causal ef- fects of treatments in randomized and nonrandomized studies, f Educ Psychol.

1974;66:688-701.

14.

Rubin DB. Gausal inference using potential outcomes: design, modeling, decisions.//Im Stat Assoc 2005;100:

322-331.

15.

Baker SG. Analysis of survival data from a randomized trial with all-or- none compliance: estimating the cost- effectiveness of a cancer screening pro- gram.//Im SiaMssoc. 1998;93:

929-934.

16.

Gampbell, DT Factors relevant to the validity of experiments in sodal set- tings.

Psychol Bull.

1957;54: 297-312.

17.

Shadish WR, Gook TD, Gampbell DT Experimental and Quasi-Experimental Designs for Generalized Causal Inference.

Boston, MA: Houghton-Mifllin; 2002.

18.

Vinokur AD, Price RH, Schul Y Impact of the JOBS intervention on unemployed workers varying in risk for depression. Amf Community Psychol 1995,23:39-74.

19.

Holland PW. Gausal inference, path analysis, and recursive structural equation models (with discussion). In:

Glogg G, ed.

Sociological Methodology 1988. Washington, DG: American Sod- ological Assodation; 1988:449-493.

20.

Barnard J, Frangakis GE, Hill JL, Rubin DB. A principal stratification approach to broken randomized experi- ments: a case study of school choice vouchers in New York Gity (with dis- cussion)./Am Stat Assoc.

2003;98:

299-323.

21.

Mark MM, Reichardt GS. Quasi- experimental and correlational designs:

methods for the real world when ran- dom assignment isn't feasible. In: San- sone G, Morf GG, Panter AT, eds. Sage Handbook of Methods in Social Psychol- ogy.

Thousand Oaks, GA: Sage Publica- tions, 2003:265-286.

22.

WestSG,BiesanzJG, Pitts SG.

Gausal inference and generalization in field settings: experimental and quasi- experimental designs. In: Reis HT, Judd GM, eds.

Handbook of Research Methods in Social and Personality Psy- chology.

New York, NY: Gambridge Uni- versity Press; 2000:40-84.

23.

Finkelstein MO, Levin B, Robbins H.

Glinical and prophylactic trials with as- sured new treatment for those at greater risk: 1. A design proposal. AmJ Public Health. 1996,86:691-695.

24.

Finkelstein MO, Levin B, Robbins H.

Glinical and prophylactic trials with as- sured new treatment for those at greater risk: II. Examples. AmJ Public Health. 1996;86:696-705.

25.

Ludwig J, Miller DL. Does Head Start improve children's life chances?

evidence from a regression discontinuity design. QJEcon. 2007;122:159-208.

26.

Khuder SA, Milz S, Jordan T, Price J, Silvestii K, Butler P The impact of a smoking ban on hospital admissions for coronary heart disease.

Preo Med.

2007;45:33-8.

27.

Hawkins NG, Sanson-Fisher RW, Shakeshaft A, D:Este G, Green LW. The multiple baseline design for evaluating population-based research. AmJ Prev Med 2007;33:162-168.

28.

Gochrane WG. The planning of observational studies of human popula- tions (with discussion).

/ R Stat Soc.

Se- ries A (General). 1965;128:236-265.

29.

Rosenbaum PR. Oiseroa/iona/Sftid- ies.

2nd ed. New York. NY: Springer; 2002.

30.

Rubin DB.

Matched Sampling for Causal Effects.

New York, NY: Gam- bridge University Press; 2006.

31.

West SG, Thoemmes F Equating groups. In: Alasuutari P, Brannen J, Bickman L, eds.

.The SAGE Handbook of Social Research Methods.

London, England: Sage Publications; 2008:

414-430.

32.

Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects.

Biometrika. I983;70:41-55.

33.

McGaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies.

Psychol Methods.

2004;9:403-425.

34.

Haviland A, Nagin DS, Rosenbaum PR. Gombining propensity score match- ing and group-based trajectory analysis in an observational study.

Psychol Meth- ods. 2007;12:247-267 35.

Sommer A, Zeger SL.

On estimating August 2008, Vol 98, No. 8 ) American Journal of Public Health West et al. | Peer Reviewed | Research Innovations and Recommendations | 1365 RESEARCH INNOVATIONS AND RECOMMENDATIONS efUcacy from clinical trials.

Stat Med.

1991;10:45-52.

36.

Winship C, Morgan SL. The esti- mation of causal effects from observa- tional data.

Annu Rev Sodol.

1999;25:

659-706.

37.

Morgan SL, Winship C.

Counterfac- tuals and Causal Inference:

Methods and Principles for Social Research.

New York, NY:

Cambridge University Press; 2007 38.

Shadish WR, Cook TD.

Design rules:

more steps towards a complete theory of quasi-experimentation.

Stat Sei.

1999;14:294-300.

39.

Rosenbaum PR.

Replicating effects and biases.

Am Stat.

2001 ;55:223-227 40.

Reynolds KD, West SG. A muld- plist strategy for strengthening non- equivalent control group designs.

Eval Rev.

1987:11:691-714.

41.

Moher M, Schulz KF, Altman D, the CONSORT Group.

The CONSORT statement:

revised recommendations for improving the quality of reports of parallel-group randomized trials.

2001;285:1987-1991.

42. Des Jaríais DC, Lyies C, Crepaz N, the TREND Group.

Improving the re- porting quality of nonrandomized evalu- ations of behavioral and public health evaluations:

the TREND statement.

Am J Public Health.

2004:94:361-366.

Adverse Event Detection in Drug Development:

Recommendations and Obligations Beyond Phase 3 Prémarketing studies of drugs, although large enough to demonstrate efficacy and detect common adverse events, cannot reliably de- tect an increased incidence of rare adverse events or events with significant latency. For most drugs, only about 500 to 3000 participants are stud- ied, for relatively short dura- tions, before a drug is mar- keted.

Systems for assess- ment of postmarketing ad- verse events include sponta- neous reports, computerized claims or medical record data- bases, and formal postmar- keting studies.

We briefly review the strengths and limitations of each.

Postmarketing surveil- lance is essential for devel- oping a full understanding of the balance between benefits and adverse effects. More work is needed in analysis of data from spontaneous re- ports of adverse effects and automated databases, design of ad hoc studies, and design of economically feasible large randomized studies.

{Am J Public Health. 2008; 98:1366-1371.

doi:10.2105/ AJPH.2007.124537) Jesse A. Berlin, ScD, Susan C. Glasser, PhD, and Susan S. Ellenberg, PhD REPORTS OF DEVASTATING adverse events suffered by pa- tients create public doubt about whether drugs are safe. Develop- ing "safe" drugs presents a high hurdle, because every drug carries potential for harm ("risk"). Drug safety cannot be considered an absolute; it can only be assessed relative to the drug's benefits. At the time of marketing, however, the amount of information on benefits and risks, especially long term, is relatively small, and often based on highly selected popula- tions with respect to age, comor- bidities, use of concomitant med- ications, and other factors.

We discuss drug development and assessment of adverse events and offer recommendations for continued evaluation of benefits and harms after a medicinal product becomes marketed.

DRUG DEVELOPMENT PROCESS The drug development pro- cess, lrom discovery to market, is long and costly.''^ Rigorous processes are in place during dinical trials that protect the safety of study pcirticipants and also ensure that collection of ad- verse event data is complete.

This completeness, coupled with the randomized design, also helps develop an understanding of the benefits and side effects of a new medicine by strengthening the validity of the comparisons between the new drug and the comparator, which could be a placebo or an active therapy for the condition under study.

Preclinical Testing Prior to being studied in hu- mans, a drug candidate under- goes an extensive series of labo- ratory and animal tests to study possible therapeutic and adverse effects. Preclinical studies are also used to characterize the pharmacokineücs and pharmaco- dynamics of the drug, including absorption, distribution, metabo- lism, excretion, and persistence of pharmacological effects.

A preclinical evaluation of safety includes in vitro and in vivo studies in animals to search for unintended pharmacological and toxic effects at the whole- cinimctl level cind on specific or- gans and tissues. In addition, car- cinogenicity and mutagenicity studies are conducted, along with specific tests of effects on cardiac rhythms. If results suggest the product cein be used safely and may produce the desired benefi- cial effects, the stage is set for testing in humans. There is gen- erally a low threshold for reject- ing drugs for safety reasons; the assumption is that unfavorable preclinical results are predictive of human safety problems (al- though the validity of this as- sumption may be questionable).

Most drug candidates, whether for safety concerns or insufficient potential for efBcacy, will never complete the development pro- cess; only 1 of every 5000 to 10 000 compounds that enter preclinical testing will become approved for marketing.'' 1366 I Research Innovations and Recommendations \ Peer Reviewed | Berlin et al.

American Journal of Public Health | August 2008, Vol 98, No. 8