U8D2-60 - Dependability and Credibility in Qualitative Research - See Details

DOI 10.1007/s11135-006-9000-3 Quality & Quantity (2007) 41:233–249 © Springer 2006 Validity and Qualitative Research: An Oxymoron?

ANTHONY J. ONWUEGBUZIE 1,∗and NANCY L. LEECH 2 1Department of Educational Measurement and Research, University of South Florida, College of Education, Tampa, FL, USA; 2Division of Educational Psychology, School of Education and Human Development, University of Colorado at Denver and Health Sciences Center, Denver, CO, USA Abstract. Although the importance of validity has long been accepted among quantitative researchers, this concept has been an issue of contention among qualitative researchers.

Thus, the first purpose of the present paper is to introduce the Qualitative Legitimation Model , which attempts to integrate many of the types of validity identified by qualita- tive researchers. The second purpose of this article is to describe 24 methods for assessing the truth value of qualitative research. Utilizing and documenting such techniques should prevent validity and qualitative research from being seen as an oxymoron. Key words: qualitative research, legitimation, validity, criteria, standards, rigor, accountability 1. Framework for Establishing Design-Specific Legitimacy in Qualitative Research Validity in qualitative research has been operationalized in a myriad of ways. To date, no one definition of validity represents a hegemony in quali- tative research. In fact, it appears that all the conceptualizations of validity are appropriate at least for some qualitative research designs. As such, each of these existing models appear to have merit. This provides support for Ei- senhart and Howe’s (1992) contention of a unitary concept of validity with different design-specific conditions. Surmising that there are threats to internal and external validity at the three major stages of the research process (i.e., research design/data collection, data analysis, and data interpretation), Onwuegbuzie (2003a) developed a model to expand Campbell and Stanley’s (Campbell, 1957, 1963a, 1963b; Campbell and Stanley, 1963) threats to internal and external validity. However, in any particular quantitative research study, the research ∗Author for correspondence: Department of Educational Measurement and Research, College of Education, University of South Florida, 4204 East Fowler Street, EDU 162, Tampa, FL 33620-7750, USA. E-Mail: [email protected]. 234 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH Threats to External Credibility Threats to Internal Credibility DataAnalysis ResearchDesign/DataCollection DataInterpretation Population GeneralizabilityEcological GeneralizabilityTemporal GeneralizabilityResearcher BiasReactivityOrder BiasEffect size Catalytic ValidityCommunicative ValidityAction ValidityInvestigation ValidityInterpretative validityEvaluative ValidityConsensual Validity Ironic LegitimationParalogical LegitimationRhizomatic LegitimationEmbodied LegitimationStructural Corroboration DescriptiveValidity TheoreticalValidity Confirmation BiasIllusory CorrelationCausal ErrorEffect Size Observational BiasResearcher Bias Observational BiasResearcher BiasReactivity Figure 1. Qualitative legitimation model. design/data collection, data analysis, and data interpretation stages typically represent three distinct (linear) phases; this is not the case in qualitative research. Indeed, in interpretive research, these three stages are iterative.

Therefore, any conceptualization of validity in qualitative research should take into account this iterative feature. Figure 1 represents the Qualitative Legitimation Model , which attempts to integrate the types of validity identified above, as well as threats adopted from Onwuegbuzie’s (2003a) model. The Qualitative Legitimation Model comprises threats to internal credibility and external credibility. Internal credibility can be defined as the truth value, applicability, consistency, neu- trality, dependability, and/or credibility of interpretations and conclusions within the underlying setting or group. Internal credibility corresponds to what Onwuegbuzie (2003a) termed internal replication in quantitative VALIDITY AND QUALITATIVE RESEARCH 235 research. On the other hand, external credibility refers to the degree that the findings of a study can be generalized across different populations of persons, settings, contexts, and times. That is, external credibility pertains to the confirmability and transferability of findings and conclusions. All threats identified in the Qualitative Legitimation Model are classified either as threats to internal credibility, external credibility, or both.

2. Threats to Internal Credibility in Qualitative Research As illustrated in Figure 1, the following threats to internal credibility are pertinent to qualitative research: ironic legitimation, paralogical legit- imation, rhizomatic legitimation, voluptuous (i.e., embodied) legitimation, descriptive validity, structural corroboration, theoretical validity, observa- tional bias, researcher bias, reactivity, confirmation bias, illusory correla- tion, causal error, and effect size. Each of these is discussed in the following section. Ironic legitimation . Ironic legitimation rests on the assumption that there are multiple realities of the same phenomenon such that the truth value of the research depends on its ability to reveal co-existing opposites (Lather, 1993). Paralogical legitimation . Paralogical legitimation represents that aspect of validity that reveals paradoxes (Lather, 1993). Rhizomatic legitimation . Rhizomatic legitimation stems from mapping and not merely from describing data (Lather, 1993). Voluptuous legitimation . Voluptuous legitimation, also known as embod- ied validity or situated validity, is interpretive in nature. This form of legit- imation assesses the extent to which the researcher’s level of interpretation exceeds her/his knowledge base stemming from the data (Lather, 1993). Descriptive validity . Descriptive validity refers to the factual accuracy of the account as documented by the researcher (Maxwell, 1992). Structural corroboration . In structural corroboration, the qualitative researcher utilizes multiple types of data to support or to contradict the interpretation (Eisner, 1991). Theoretical validity . Theoretical validity represents the degree to which a theoretical explanation developed from research findings fits the data, and thus, is credible, trustworthy, and defensible (Maxwell, 1992). Observational bias . Observational bias arises at the research design/data collection stage when the data collectors have obtained an insufficient sam- pling of behaviors or words from the study participant(s) (Onwuegbuzie, 2003a). Apparently, such inadequate sampling of behaviors occurs if either persistent observation or prolonged engagement does not prevail (Lincoln and Guba, 1985). Observational bias also can occur at the data analysis 236 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH stage, if an insufficient sample of behaviors or words is analyzed from the underlying data. Researcher bias . According to Onwuegbuzie (2003a), researcher bias occurs when the researcher has personal biases or a priori assumptions that he/she is unable to bracket. This bias may be subconsciously trans- ferred to the participants in such a way that their behaviors, attitudes, or experiences are affected. In addition to influencing participants unduly, the researcher could affect study procedures (e.g., ask leading questions in an interview) or even contaminate data collection techniques. Researcher bias does not occur only at the data collection stage, it can also prevail at the data analysis and data interpretation phases. Researcher bias is a very common threat to legitimation in constructivist research because the researcher usually serves as the person (i.e., instrument) collecting the data. As noted by Onwuegbuzie (2003a), researcher bias can be either active or passive . Passive sources include personality characteristics or attri- butes of the researcher (e.g., gender, ethnicity, type of clothing worn), whereas active sources may include mannerisms and statements made by the researcher that provide the participants with information about the researcher’s preferences. Another form of researcher bias is when the researcher’s prior knowledge of the participants unduly influences the participants’ behaviors. Reactivity . Reactivity refers to a number of facets related to the way in which a study is undertaken and the reactions of the participants involved (Onwuegbuzie, 2003a). That is, reactivity involves changes in per- sons’ responses that result from being cognizant of the fact that one is par- ticipating in a research investigation. For example, the mere presence of observers during a study may alter the typical responses of the group that provide rival explanations for the findings, which, in turn, threaten internal credibility at the data collection stage. Onwuegbuzie (2003a) identified the following two major components of reactivity pertinent to qualitative research: (a) the Hawthorne effect , and (b) the novelty effect . The Hawthorne effect pertains to when individuals interpret their receiving an intervention as being given special consider- ation. As such, their reaction to this perception makes it difficult to iso- late naturally occurring observations from contrived situations. Similarly, the novelty effect refers to artificial responses on the part of study partic- ipants merely because a novel stimuli is introduced into the environment solely for the purpose of collecting data (e.g., a video camera). Confirmation bias . Confirmation bias is the tendency for interpretations and conclusions based on new data to be overly congruent with a priori hypotheses (Greenwald et al., 1986). As noted by Onwuegbuzie (2003a), confirmation bias, per se, does not necessarily pose a threat to internal credibility. It threatens internal credibility at the data interpretation stage VALIDITY AND QUALITATIVE RESEARCH 237 only when there exists at least one plausible rival explanation to under- lying findings that might be demonstrated to be superior if given the opportunity. Illusory correlation . The illusory correlation represents a tendency to identify a relationship among events, people, and the like, when no such relationship actually prevails. The illusory correlation may arise from a false consensus bias , in which researchers have the false belief that most other individuals share their interpretations of a relationship (Onwuegbuzie, 2003a). Such an illusory correlation poses a serious threat to internal credibility at the data interpretation stage. Causal error . Like quantitative researchers, qualitative investigators often provide causal explanations and attributions for observed behaviors and attitudes without attempting to verify such interpretations. This can lead to error in the data. Effect size . There are many situations in which effect sizes would provide a thicker description of underlying qualitative data (Onwuegbuzie, 2003b). The use of effect sizes actually qualitizes (i.e., converts numeri- cal data into narrative data that can be analyzed qualitatively; Tashakkori and Teddlie, 1998) empirical data by helping data analysts to determine the meaningfulness of behavior and words, conclusions that are based on qualitative categorizations (Onwuegbuzie and Teddlie, 2003). 3. Threats to External Credibility in Qualitative Research As illustrated in Figure 1, the following threats to external credibility are pertinent to qualitative research: catalytic validity, communicative validity, action validity, investigation validity, interpretive validity, evaluative valid- ity, consensual validity, population generalizability, ecological generalizabil- ity, temporal generalizability, researcher bias, reactivity, order bias, and effect size. Each of these is discussed in the following section. Catalytic validity . Catalytic validity is the degree to which a given research study empowers and liberates a research community (Lather, 1986). Communicative validity . According to Kvale (1995), communicative valid- ity involves testing the validity of knowledge claims in a discourse. In other words, validity is agreed upon by the collection of researchers. Action validity . With respect to action validity, justification of the valid- ity of the research is based on whether or not it works – that is, whether or not the research findings are used by decision makers and other stake- holders (Kvale, 1995). Investigation validity . Investigation validity is the quality of craftsman- ship, in which validity is the researcher’s quality control. Accordingly, 238 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH validity not only is a matter of the methods used, but also of the researcher’s personality traits, including her or his ethicalness (Kvale, 1995). Interpretive validity . Interpretive validity refers to the extent to which a researcher’s interpretation of an account represents an understanding of the perspective of the group under study and the meanings attached to their words and actions (Maxwell, 1992). Evaluative validity . Evaluative validity refers to the extent to which an evaluation framework can be applied to the objects of study, rather than a descriptive, interpretive, or explanatory one (Maxwell, 1992). Consensual validity . Consensual validation stems from the opinion of others, with “an agreement among competent others that the description, interpretation, and evaluation and thematics of an educational situation are right” (Eisner, 1991, p. 112). Population generalizability /Ecological generalizability /Temporal generaliz- ability . Onwuegbuzie and Daniel (2003) surmised that a common error among qualitative researchers that is made at the interpretation stage is the tendency to generalize findings rather than to utilize the qualita- tive data to obtain insights into particular underlying processes and prac- tices that prevail within a specific location (Connolly, 1998). Indeed, only when relatively large representative samples are utilized should qualita- tive researchers attempt to generalize findings across different populations (i.e., population generalizability), locations (i.e., ecological generalizability), settings, contexts, and/or times (i.e., temporal generalizability). Researcher bias . Researcher bias, as described above, threatens exter- nal credibility of the findings because the particular type of bias of the researcher may be so unique as to make the interpretations of the data ungeneralizable. Reactivity. Reactivity (i.e., Hawthorne effect, novelty effect), as described above, poses a threat to external credibility because it is not clear whether the observed findings would be the same if this threat had not prevailed, thereby threatening the generalizability of the results. Order bias . Order bias occurs-when the order of the questions that are posed in an interview schedule or the order in which observations are made makes a difference to the dependability and confirmability of the findings.

In such cases, interpretations cannot be confidently generalized to situa- tions outside the study context. Effect size . As described above, failure to consider the effect size or the meaningfulness of an interpretation poses a threat to external credibility.

4. A Typology of Methods for Assessing or Increasing Legitimation At this point, it should be reiterated that a qualitative study cannot be assessed for validity (e.g., truth value, credibility, legitimation, dependability, VALIDITY AND QUALITATIVE RESEARCH 239 trustworthiness, generalizability). Rather, validity is “relative to purposes and circumstances” (Brinberg and McGrath, 1987, p. 13). Moreover, assessing legitimation does not lead to a dichotomous outcome (i.e., valid vs. invalid), but represents an issue of level or degree. Although there is no method that is guaranteed to yield valid data or trustworthy conclusions (Phillips, 1987), nevertheless, an assessment of procedures used in qualitative studies is imperative for ruling in or ruling out rival interpretations of data. Such strategies either help to evaluate legitimation or to increase legitimation, or both. Thus, what follows is a comprehensive typology and description of methods for assess- ing the truth value of qualitative research. This list of 24 strategies have been compiled from several researchers (e.g., Becker, 1970; Creswell, 1998; Fielding and Fielding, 1986; Guba and Lincoln, 1989; Kidder, 1981; Lincoln and Guba, 1985; Maxwell, 1996; Miles and Huberman, 1984, 1994; Newman and Benz, 1998; Onwuegbuzie, 2003b; Patton, 1990). The following techniques are described below: prolonged engagement, persistent observation, triangulation, leaving an audit trail, member checking/infor- mant feedback, weighting the evidence, checking for representativeness of sources of data, checking for researcher effects/clarifying researcher bias, making contrasts/comparisons, theoretical sampling, checking the meaning of outliers, using extreme cases, ruling out spurious relations, replicat- ing a finding, referential adequacy, following up surprises, structural rela- tionships, peer debriefing, rich and thick description, the modus operandi approach, assessing rival explanations, negative case analysis, confirmatory data analyses, and effect sizes. Prolonged engagement . Prolonged engagement involves conducting a study for a sufficient period of time to obtain an adequate representation of the “voice” under study. Prolonged engagement includes understanding the culture, building trusts with study participants, and checking for mis- information stemming from anomalies introduced by the researcher or the participants (Glesne and Peshkin, 1992; Lincoln and Guba, 1985). Persistent observation . The goal of persistent observation is to identify characteristics, attributes, and traits that are most relevant to the phenom- ena under investigation and focus on them extensively (Lincoln and Guba, 1985). In order to engage in persistent observation, the researcher must be able to separate relevant from irrelevant observations. Whereas prolonged engagement provides scope, persistent observation provides depth. Triangulation . Triangulation involves the use of multiple and different methods, investigators, sources, and theories to obtain corroborating evi- dence (Glesne and Peshkin, 1992; Lincoln and Guba, 1985; Merriam, 1988; Miles and Huberman, 1984, 1994; Patton, 1990). Triangulation reduces the possibility of chance associations, as well as of systematic biases prevailing 240 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH due to a specific method being utilized, thereby allowing greater confidence in any interpretations made (Fielding and Fielding, 1986; Maxwell, 1992). Denzin (1978) first demonstrated how to triangulate. He defined tri- angulation as “the combination of methodologies in the study of the same phenomenon” (p. 291). Denzin outlined the following four types of triangulation: (a) Data triangulation (i.e., use of a variety of sources in a study), (b) investigator triangulation (i.e., use of several different research- ers), (c) theory triangulation (i.e., use of multiple perspectives to interpret the results of a study), and (d) methodological triangulation (i.e., use of multiple methods to study a research problem). Similarly, although recog- nizing that triangulation may not be appropriate for all research objectives, Jick (1979) stated the following advantages of triangulation: (a) it permits researchers to be more certain of their findings; (b) it enhances the devel- opment of enterprising ways of collecting data; (c) it can unravel contradic- tions, (d) it can lead to thicker, richer data; (e) it can lead to the fusion of theories; and (f) by virtue of its extensiveness, it may serve as the litmus test for competing theories. As noted by Newman and Benz (1998), the more sources one examines the more likely the researcher is to have an ade- quate representation of the underlying phenomenon. Leaving an audit trail . Leaving an audit trail involves the researcher maintaining extensive documentation of records and data stemming from the study. Halpern (1983), in his seminal work, identified the following six classes of raw records: (a) raw data (e.g., videotapes, written notes, sur- vey results); (b) data reduction and analysis products (e.g., write-ups of field notes, summaries, unitized information, quantitative summaries, the- oretical notes), (c) data reconstruction and synthesis products (e.g., struc- ture of categories, findings and interpretations, final reports); (d) process notes (i.e., methodological notes, trustworthiness notes, audit trail notes); (e) materials related to intentions and dispositions (e.g., research proposal, personal notes, reflexive journals, expectations), and (f) instrument devel- opment information (e.g., pilot forms, preliminary schedules, observation formats, and surveys. Halpern outlined an algorithm for the audit pro- cess itself. This algorithm is divided into the following five stages: (a) pre-entry (characterized by a series of discussions between the auditor and auditee that lead to a decision to continue, continue conditionally, or to discontinue the proposed audit); (b) determination of auditabili- ty (involving understanding the study, becoming familiar with the audit trail, and assessing the study’s auditability); (c) formal agreement (involv- ing a written agreement between the two parties on what is to be accom- plished by the audit, establishing the time limit, roles to be played by the parties, determining the logistics, determining the deliverables and format of these products, and establishing re-negotiation criteria if the audit trail leads to inconsistencies); (d) determination of trustworthiness (establishing VALIDITY AND QUALITATIVE RESEARCH 241 dependability, confirmability, and credibility); and (e) closure (involving feedback, possible re-negotiation, and the writing of the final report).

Lincoln and Guba (1985) liken the audit trail to a fiscal audit. Member checking/informant feedback . Member checking, also known as informant feedback, involves systematically obtaining feedback about one’s data, analytic categories, interpretations, and conclusions from the study group (Guba and Lincoln, 1989). Member checking, which occurs contin- uously, can be both formal and informal (Lincoln and Guba, 1985). In member checking, the participants are afforded the opportunity to play a major role assessing the credibility of the account (Stake, 1995). Accord- ing to Maxwell (1996), member checking is the most effective way of elim- inating the possibility of misrepresentation and misinterpretation of the “voice.” Similarly, Lincoln and Guba (1985) pronounced member checking as “the most critical technique for establishing credibility” (p. 314). Weighting the evidence . Because some data are better than are others, researchers should pay particular attention to data quality issues. Stronger data subsequently should be given more weight than weaker data. As noted by Miles and Huberman (1994), the reasons why certain data are stronger than others are linked to Maxwell’s (1992) descriptive and interpretive validity. According to Miles and Huberman, situations when data typi- cally are stronger include the following: (a) when they are collected later or after prolonged engagement and persistent observation; (b) when they are observed or reported firsthand; (c) when the field-worker is trusted; and (d) when the data are collected in informal settings. Checking for representativeness . Representativeness relates both to Maxwell’s (1992) internal and external generalizability. According to Miles and Huberman (1994), inaccurate generalizations prevail when (a) non- representative informants are sampled, which often stem from an overreliance on accessible and elite informants; (b) generalizations are made from non- representative events or activities, which often arise from the researcher’s non-continuous presence at the field, as well as from an over-rating of striking events; and (c) inferences are made from non-representative pro- cesses, typically stemming from non-representative informants and events, over-reliance on plausible interpretations, holistic biases, and an adequate fit of data into emerging interpretations. As is the case for quantitative research, representativeness can be improved by increasing the number of participants, looking purposively for contrasting participants, stratifying the sample, and obtaining a random sample (Miles and Huberman, 1994). Checking for researcher effects/clarifying researcher bias . As noted ear- lier, researcher bias is an extremely serious threat to validity in qualitative research. Miles and Huberman (1994) identified two sources of researcher bias: (a) the effects of the researcher on the participant(s) (i.e., Bias A); and (b) the effects of the participant(s) on the researcher (i.e., Bias B). 242 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH These biases may become pervasive at any stage of the research process.

According to Miles and Huberman, Bias A occurs when the researcher dis- rupts or poses a threat to the existing social or institutional relationships.

Bias A also can lead to informants implicitly or explicitly boycotting the researcher, who is viewed as a nuisance, spy, voyeur, or adversary. Further, Bias A can inhibit informants. On the other hand, Bias B can lead to the researcher going native. Bias A can be reduced by (a) prolonged engagement, (b) persistent observation, (c) using unobtrusive measures where possible, (d) making the researcher’s intentions clear, (e) co-opting an informant, (f) conduct- ing some of the interviewing and focus groups in a neutral site, and (g) being careful not to exacerbate any potential problems. Further, Bias B can be can be minimized by (a) avoiding elite bias by selecting a heter- ogeneous sample, (b) avoiding going native by spending time away from the site, (c) including non-typical participants, (d) maintaining a conceptual framework, (e) utilizing informants to provide background and historical information, (f) triangulating data, (g) examining potential informant bias, (h) showing field notes to a colleague, and (i) continually keeping research questions firmly in mind (Miles and Huberman, 1994). Making contrast/comparisons . Although use of control groups is com- monly associated with quantitative research, there are occasions where comparisons in qualitative studies are justified (Onwuegbuzie and Leech, 2005). Here, multisite studies can be extremely enlightening. Qualitative findings also can be compared with the extant literature, as well as with the researcher’s experience and knowledge base. Theoretical sampling . As advocated by Newman and Benz (1998), the- oretical sampling involves the researcher following where the data lead and not leading the data. When the goal of the qualitative research is to develop theory, the researcher should “attempt to capture the best the- ory that explains the data” (p. 53). Thus, the researcher should attempt to sample based on the theory. Checking the meaning of outliers . As is the case for quantitative research, most findings in qualitative studies have exceptions. Unfortunately, the temptation of researchers is to ignore these outlying explanations or to attempt to explain them away (Miles and Huberman, 1994). Yet, out- liers can provide extremely valuable insights to the underlying phenomena.

A careful examination of the outlying observations, cases, settings, events, or treatments can help to strengthen conclusions not only by testing the generality of the findings, but also by minimizing confirmation bias, illu- sory correlations, and causal error. Using extreme cases . As noted by Miles and Huberman (1994), extreme cases can be extremely useful in assessing interpretations and conclusions. VALIDITY AND QUALITATIVE RESEARCH 243 By identifying extreme cases, the researcher can then verify whether what is absent in them is present or different in other participants, or vice versa. Ruling out spurious relations . As is the case for quantitative research, researchers should carefully examine whether a relationship between two variables that appears to emerge from the data represents a causal link, or whether one or more intervening factors are responsible for the association.

The latter scenario implies the existence of an illusory correlation. In order to minimize the occurrence of illusory correlations, researchers should not rush to judgment in deciding whether two variables are causally related. To this end, researchers should consider using a knowledgeable colleague or other non-stakeholder who is not part of the research team to play the role of “devil’s advocate” in searching for possible moderating variables. Replicating a finding . The greater the extent to which a researcher can generalize the account of a particular situation or population to other indi- viduals, times, settings, or context (i.e., internal generalizability; Maxwell, 1992), the more confident he/she can be about the underlying finding and, consequently, the greater the evidence of legitimation. Similarly, credibility is enhanced if findings can be generalized beyond the group, setting, time, or context (i.e., external generalizability; Maxwell, 1992). Referential adequacy . Eisner (1975) is credited for introducing the con- cept of referential adequacy. These recorded supportive materials provide a form of standard against which later data analyses, interpretations, and conclusions (i.e., the critiques) could be assessed for adequacy (Lincoln and Guba, 1985). Referential materials are not limited to electronically-recorded data. Other types of materials could be utilized, including photographs and text.

However, whatever materials are used, it is important that they represent a segment of the raw data that are not analyzed by the researcher for the purpose of data interpretation, but are archived for later recall and comparison. Following up surprises . By its very exploratory nature, interpretivist research lends itself to unexpected findings, some of which may be very sur- prising to the analysts. Rather than ignoring or dismissing surprising find- ings, qualitative researchers should follow up these surprises. According to Miles and Huberman (1994), following up surprises has the following three components: (a) reflecting on the surprise to surface on the violated the- ory, (b) considering how to revise the violated theory, and (c) looking for evidence to support revised theory. Structural relationships . Newman and Benz (1998) recommend that data- sets be compared for consistency. According to this line of reasoning, when attempting to interpret data and generate conclusions, constructiv- ists should obtain support for their insights by comparing and contrast- ing different datasets. These datasets may stem from different perspectives 244 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH that can be compared in much the same way that findings can be compared with the extant literature, to the researcher’s experience, and to the knowl- edge base, as described above. Peer debriefing . Peer debriefing provides an external evaluation of the research process (Glesne and Peshkin, 1992; Lincoln and Guba, 1985; Maxwell, 1996; Merriam, 1988; Newman and Benz, 1998). Peer debrief- ing is essentially another form of inter-rater reliability, the major differ- ence being that it is not empirically based but logically based. Lincoln and Guba (1985, p. 308) describe the role of the peer debriefer as the “devil’s advocate,” a person who keeps the researcher “honest”; who poses difficult questions about the procedures, meanings, interpretations, and conclusions; and who provides the researcher with the opportunity for “carthasis” by being empathetic with the researcher’s feelings. Rich and thick description . An important way of providing credibility of findings is by collecting rich and thick data, which correspond to data that are detailed and complete enough to maximize the ability to find meaning.

Becker (1970) advocated that such data necessitate verbatim transcripts of interviews, as opposed to selected notes. For observations, detailed, descrip- tive note-taking about specific, explicit events and behaviors underlie rich, thick data. Becker contended that provision of such data minimizes con- firmation bias by facilitating the testing of emerging theories, rather than merely providing a source of supporting data points. Rich, thick descrip- tion informs the reader about transferability. That is, with such detailed information, the reader is able to transfer information to other settings and contexts to determine whether the findings can be transferred “because of shared characteristics” (Erlandson et al., 1993, p. 32). The Modus Operandi approach . In the modus operandi method, a name termed by Scriven (1974), threats to validity are treated as events rather than as elements to be controlled. Thus, the analyst searches for clues as to whether or not these threats to validity took place (Maxwell, 1996). Peer reviewers can play an important role here in determining which sources of invalidity might have prevailed. Assessing rival explanations . Because of the time-consuming nature of qualitative data analyses, it is extremely difficult for interpretivists to detach themselves from their initial data interpretations. Indeed, few naturalists con- duct rival hypothesis-testing, perhaps also because hypothesis testing is asso- ciated with the quantitative paradigm and its empiricist ideology. In test- ing rival explanations, interpretivists should entertain several rival explana- tions, and perhaps even collect more data, until one explanation emerges as the most compelling. However, rather than comparing and contrasting several possible explanations, the researcher should only consider plausible explanations. Foreclosing too early on rival explanations leads to the type confirmation bias in general and illusory correlations and causal error in VALIDITY AND QUALITATIVE RESEARCH 245 particular. On the other hand, closing too late on alternative interpretations typically culminates in too weak a case being built for the chosen explana- tion because of the overwhelming amount of rival hypothesis-testing data collected. Thus, rival-hypothesis data collection should stop when the satu- ration point is reached – that is, when the rival explanations are shown to be either flawed or better than the existing explanations. Negative case analysis . Negative case analysis involves continually mod- ifying the emerging hypothesis using past and future observations until all known data are accounted for by the hypothesis (Kidder, 1981; Newman and Benz, 1998). In other words, negative case analysis is the process of expanding and revising one’s interpretation until all outliers have been explained (Creswell, 1998; Lincoln and Guba, 1985; Maxwell, 1996; Miles and Huberman, 1994). A single negative (i.e., discrepant) case is sufficient to require the researcher to modify the hypothesis (Kidder, 1981). For any cases that do not fit the final hypothesis or model, Wolcott (1990) recom- mends that they should be documented in the final report to allow readers to evaluate them and to draw their own conclusions. As noted by Lincoln and Guba (1995), negative case analysis provides a useful way of making data more credible by minimizing the number of negative cases. Confirmatory data analyses . As outlined by Onwuegbuzie (2003b), con- firmatory thematic analyses can be conducted, in which replication qualitative studies are conducted to assess the replicability (i.e., external generalizability) of previous emergent themes (i.e., research driven) or to test an extant theory (i.e., theory driven), when appropriate. Such confir- matory techniques help to provide legitimation to previous qualitative find- ings, interpretations, and conclusions. Effect sizes . As stated by Sechrest and Sidani (1995, p. 79), “qualita- tive researchers regularly use terms like ‘many,’ ‘most,’ ‘frequently,’ ‘sev- eral,’ ‘never,’ and so on. These terms are essentially quantitative.” In fact, by obtaining counts, qualitative researchers can quantitize such expressions.

As noted by Miles and Huberman (1994), in qualitative research, numbers tend to get ignored. Both “‘number of times’ and ‘consistency’ judgments are based on counting” (Miles and Huberman, 1994, p. 251). Onwuegbuzie (2003b) noted that the frequency of emergent themes, what he termed as a frequency effect size can be determined by first binarizing themes. Although binarizing themes can be criticized as an over- simplification of emergent themes that does not adequately represent the complexity of the meaning conveyed by the unit, as stated by Sechrest and Sidani (1995, p. 79), the person making the statement or action, “would have to have shared understanding of all those additional meanings, in which case the binary code would include them all, or else the statement would have to be accompanied by a set of additional descriptors/modifiers that could themselves be coded.” Regardless, as stated by Onwuegbuzie, the 246 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH goal of binarizing themes is not to replace the description and interpreta- tion of the themes, but to enhance the development of effect size informa- tion that would complement these descriptions.

5. Summary and Conclusions Although, the importance of validity has long been accepted among quan- titative researchers, this concept has been an issue of contention among qualitative researchers. Miles and Huberman (1994, p. 277) note that qualitative studies take place in a real social world, and can have real consequences in people’s lives; that there is a reasonable view of “what happened” in any particular situation (e.g., including what was believed, interpreted); and that we who render accounts of it can do so well or poorly, and should not consider our work unjudgable. This is the posi- tion that underlies the present paper. In fact, in outlining the Qualitative Legitimation Model, the goal is to facilitate the sharing of standards, as is recommended by a growing number of qualitative theorists (Howe and Eisenhart, 1990; Miles and Huberman, 1994; Williams, 1986). As noted by Maxwell (1992), use of legitimation frameworks, such as the Qualitative Legitimation Model, does not depend on the existence of some absolute truth or reality to which an account can be compared, but only on the fact that there exists ways of assessing accounts that do not depend entirely on fea- tures of the account itself, but in some way relate to those things that the account claims to be about. (p. 283) Although the Qualitative Legitimation Model is relatively comprehen- sive, it is by no means exhaustive. Indeed, interpretivists are encouraged to find ways to improve upon this framework. Moreover, it should be noted that in any particular qualitative study, not all of the threats contained in the model will be pertinent. Unlike in quantitative research, where the goal is to minimize all sources of invalidity, different validity components of the Qualitative Legitimation Model will be relevant in different qualitative studies. As such, the Qualitative Legitimation Model is extremely flexible.

Indeed, as other threats to legitimation in qualitative research are concep- tualized, these can be added to the Qualitative Legitimation Model. In any case, it is hoped that this paper makes it clear that in every qual- itative inquiry, findings, interpretations, and conclusions should be assessed for truth value, applicability, consistency, neutrality, dependability, credi- bility, confirmability, transferability, generalizability, or the like. Further, legitimacy should not only be undertaken, but documented and delin- eated in the final research report, so that qualitative research can be made VALIDITY AND QUALITATIVE RESEARCH 247 public, instead of the private status that it tends to have (Constas, 1992).

Future editions of the American Psychological Association Publication Man- ual (APA, 2001) can play an important role here by providing strong encouragement for interpretivists to document how they obtained their data, their interpretations, and their conclusions. Journal reviewers and edi- tors of qualitative research journals also can make an impact here by set- ting minimum standards of rigor, communication, and ways of working toward consensus (Lincoln, 1995). Utilizing and documenting legitimation techniques should prevent validity and qualitative research from being seen as an oxymoron, especially by beginning qualitative researchers.

References American Psychological Association. (2001). Publication Manual of the American Psychological Association. 5th edn. Washington, DC: Author. Becker, H. S. (1970). Sociological Work: Method and Substance . New Brunswick, NJ: Transaction Books. Brinberg, D. & McGrath, J. E. (1987). Validity and the Research Process . Newbury Park, CA: Sage. Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin 54: 297–312. Campbell, D. T. (1963a). From description to experimentation: Interpreting trends as quasi- experiments. In: C. W. Harris (ed.), Problems in Measuring Change . Madison, WI: University of Chicago Press, pp. 212–242. Campbell, D. T. (1963b). Social Attitudes and other acquired behavioral dispositions. In: S. Koch (ed.), Psychology: A Study of Science: Investigations of Man as Socius , vol. 6. New York: Rand McNally. Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research . Chicago: Rand McNally. Connolly, P. (1998). ‘Dancing to the wrong tune’: Ethnography generalization and research on racism in schools. In: P. Connolly & B. Troyna (eds.), Researching Racism in Educa- tion: Politics, Theory, and Practice . Buckingham, UK: Open University Press. Constas, M. A. (1992). Qualitative analysis as a public event: The documentation of cate- gory development procedures. American Educational Research Journal 29:253–266. Creswell, J. W. (1998). Qualitative Inquiry and Research Design: Choosing Among Five Traditions . Thousand Oaks, CA: Sage. Denzin, N. K. (1978). The Research Act: A Theoretical Introduction to Sociological Methods . New York: Praeger. Eisenhart, M. A. & Howe, K. R. (1992). Validity in educational research. In: M. D. LeCompte, W. L. Millroy & J. Preissle (eds.), The Handbook of Qualitative Research in Education . Thousands Oaks, CA: Sage, pp. 643–680. Eisner, E. W. (1975). The Perspective Eye: Toward the Reformulation of Educational Evaluation . Occasional papers of the Stanford Evaluation Consortium. Stanford, CA: Stanford University. Eisner, E. W. (1991). The Enlightened Eye: Qualitative Inquiry and the Enhancement of Educational Practice . New York: Macmillan. Erlandson, D. A., Harris, E. L., Skipper, B. L. & Allen, S. D. (1993). Doing Naturalistic Inquiry: A Guide to Methods . Newbury Park, CA: Sage. 248 ANTHONY J. ONWUEGBUZIE AND NANCY L. LEECH Fielding, N. & Fielding, J. (1986). Linking Data . Beverly Hills, CA: Sage. Glesne, C. & Peshkin, A. (1992). Becoming Qualitative Researchers: An Introduction . White Plains, NY: Longman. Greenwald, A. G., Pratkanis, A. R., Leippe, M. R. & Baumgardner, M. H. (1986). Under what conditions does theory obstruct research progress. Psychological Review 93: 216–229. Guba, E. G. & Lincoln, Y. S. (1989). Fourth Generation Evaluation . Newbury Park, CA: Sage. Halpern, E. S. (1983). Auditing Naturalistic Inquiries: The Development and Application of a Model . Unpublished doctoral dissertation, Indiana University. Howe, K. R. & Eisenhart, M. (1990). Standards for qualitative (and quantitative) research: A prolegomenon. Educational Researcher 19(4): 2–9. Jick, T. D. (1979). Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly 24: 602–611. Kidder, L. H. (1981). Qualitative research and quasi-experimental frameworks. In: M. B. Brewer & B. E. Collins (eds.), Scientific Inquiry and the Social Sciences . San Francisco, CA: Jossey-Bass. Kvale, S. (1995). The social construction of validity. Qualitative Inquiry 1: 19–40. Lather, P. (1986). Issues of validity in openly ideological research: Between a rock and a soft place. Interchange 17: 63–84. Lather, P. (1993). Fertile obsession: Validity after poststructuralism. Sociological Quarterly 34: 673–693. Lincoln, Y. S. (1995). Emerging criteria for quality in qualitative and interpretive research. Qualitative Inquiry 1: 275–289. Lincoln, Y. S. & Guba, E. G. (1985). Naturalistic Inquiry . Beverly Hills, CA: Sage. Maxwell, J. A. (1992). Understanding and validity in qualitative research. Harvard Educational Review 62: 279–299. Maxwell, J. A. (1996). Qualitative Research Design . Newbury Park, CA: Sage. Merriam, S. (1988). Case Study Research in Education: A Qualitative Approach . San Francisco, CA: Jossey-Bass. Miles, M. B. & Huberman, A. M. (1984). Drawing valid meaning from qualitative data: Toward a shared craft. Educational Researcher 13: 20–30. Miles, M. B. & Huberman, A. M. (1994). Qualitative Data Analysis: An Expanded Source- book , 2nd edn. Thousand Oaks, CA: Sage. Newman, I. & Benz, C. R. (1998). Qualitative-Quantitative Research Methodology: Exploring the Interactive Continuum . Carbondale, IL: Southern Illinois University Press. Onwuegbuzie, A. J. (2003a). Expanding the framework of internal and external validity in quantitative research. Research in the Schools 10: 71–90. Onwuegbuzie, A. J. (2003b). Effect sizes in qualitative research: A prolegomenon. Quality & Quantity: International Journal of Methodology 37(4): 393–409. Onwuegbuzie, A. J. & Daniel, L. G. (2003, February 12). Typology of analytical and inter- pretational errors in quantitative and qualitative educational research. Current Issues in Education [On-line], 6(2). Available: http://cie.ed.asu.edu/volume6/number2/. Onwuegbuzie, A. J. & Leech, N. L. (2005, February). Sampling Designs in Qualitative Research: Making the Sampling Process More Public . Paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, LA. Onwuegbuzie, A. J. & Teddlie, C. (2003). A framework for analyzing data in mixed methods research. In: A. Tashakkori & C. Teddlie (eds.), Handbook of Mixed Methods in Social and Behavioral Research . Thousand Oaks, CA: Sage, pp. 351–383. Patton, M. Q. (1990). Qualitative Evaluation and Research Methods , 2nd edn. Newbury Park, CA: Sage. VALIDITY AND QUALITATIVE RESEARCH 249 Phillips, D. C. (1987). Validity in qualitative research: Why the worry about warrant will not wane. Education and Urban Society 20: 9–24. Scriven, M. (1974). Maximizing the power of causal investigations: The modus operandi method. In: W. J. Popham (ed.), Evaluation in Education – Current Applications . Berkeley, CA: Sage, pp. 68–84. Sechrest, L. & Sidana, S. (1995). Quantitative and qualitative methods: Is there an alterna- tive? Evaluation and Program Planning 18: 77–87. Stake, R. (1995). The Art of Case Study Research . Thousand Oaks, CA: Sage. Tashakkori, A. & Teddlie, C. (1998). Mixed Methodology: Combining Qualitative and Quan- titative Approaches . Applied Social Research Methods Series (Vol. 46). Thousand Oaks, CA: Sage. Williams, D. D. (1986). Naturalistic evaluation: Potential conflicts between evaluation stan- dards and criteria for conducting naturalistic inquiry. Educational Evaluation and Policy Analysis 8: 87–99. Wolcott, H. (1990). On seeking – and rejecting–validity in qualitative research. In: E. Eisner & A. Peshkin (eds.), Qualitative Inquiry in Education: The Continuing Debate . New York: Teachers College Press, pp. 121–152.