PART 1 (RELIABILITY) - Read the "Test Yourself" section on p. 119 in Ch. 5 of Exploring Research. PLEASE ADD REFERENCE AFTER EACH PARTDiscuss your response with your classmates.PART 2 (INTEREST) - Rea

FOUNDATIONS OF PSYCHOLOGY 106


Chapter 3A Selecting a Problem and Reviewing the Research WHAT YOU’LL LEARN ABOUT IN THIS CHAPTER:
  • • How to select a research problem

  • • Defining and sorting out idea after idea until one fits your interests

  • • The importance of personal experience in selecting a problem

  • • The steps in reviewing the literature

  • • Different sources of information and how to use them

  • • How to use journals, abstracts, and indices

  • • The difference between primary and secondary resources

  • • Using a synthesis of literature

  • • How scholarly journals work

  • • Using the Internet to complete your literature review

So here you are, in the early part of a course that focuses on research methods, and now you have to come up with a problem that you are supposed to be interested in! You are probably so anxious about learning the material contained in your professor’s lectures and what is in this volume that you barely have time to think about anything else.

If you stop for a moment and let your mind explore some of the issues in the behavioral and social sciences that have piqued your interest, you will surely find something that you want to know more about. That is what the research process is all about—finding out more about something that is, in part, already known.

Once you select an area of interest, you are only part of the way there. Next comes the statement of this interest in the form of a research question followed by a formal hypothesis. Then it is on to reviewing the literature, a sort of fancy phrase that sounds like you will be very busy! A literature review involves library time online or actually there, note taking, and organizational skills (and of course writing), but it provides a perspective on your question that you cannot get without knowing what other work has been done as well as what new work needs to be done.

But hold on a minute! How is someone supposed to have a broad enough understanding of the field and spew forth well-formed hypotheses before the literature is reviewed and then become familiar with what is out there? As poet John Ciardi wrote, therein “lies the rub.”

The traditional philosophers and historians of science would have us believe that the sequence of events leading up to a review of what has been done before (as revealed in the literature) is as shown in Figure 3A.1a. This sequence of steps is fine in theory, but as you will discover, the actual process does not go exactly in the manner shown in the figure.

The research question and research hypothesis are more an outgrowth of an interaction between the scientist’s original idea and an ongoing, thorough review of the literature (good scientists are always reading), as you can see in Figure 3A.1b. This means that once you formulate a hypothesis, it is not carved in stone but can be altered to fit what the review of the literature may reflect, as well as any change in ideas you may have. Remember, our work “stands on the shoulder of grants.”

For example, you might be interested in how working adults manage their time when they are enrolled in graduate programs. That’s the kernel of the idea you want to investigate. A research question might ask what the effects of enrollment in graduate school and full-time work are on personal relationships and personal growth. For a hypothesis, you might predict that those adults enrolled in school and who work full time and who participate in a time management support group have more meaningful personal relationships than those who do not.

Figure 3A.1a From idea to literature review, with the research hypothesis on the way.

Figure 3A.1b From idea and literature review to research hypothesis.

Use the results of previous studies to fine-tune your research ideas and hypothesis.

You might consider the hypothesis to be finished at this point, but in reality your ongoing review of the literature and your changing ideas about the relationship between the variables will influence the direction your research will take. For example, suppose the findings of a similar previous study prompt you to add an interesting dimension (such as whether the employer subsidizes the cost of tuition) to your study, because the addition is consistent with the intent of your study. You should not have to restrict your creative thinking or your efforts to help you understand the effects of these factors just because you have already formulated a hypothesis and completed a literature review. Indeed, the reason for completing the review is to see what new directions your work might take. The literature review and the idea play off one another to help you form a relevant, conceptually sound research question and research hypothesis.

In sum, you will almost always find that your first shot at a hypothesis might need revision, given the content of the literature that you review. Remember, it is your idea that you will pursue. The way in which you execute it as a research study will be determined by the way in which you state the research question and the way in which you test the research hypothesis. It is doubtful that a review of the relevant literature would not shed some light on this matter.

This chapter begins with some pointers on selecting a problem worth studying, and then the focus moves to a description of the tools and the steps involved in preparing a review of the literature.

Selecting a Problem

People go to undergraduate and graduate school for a variety of reasons, including preparing for a career, the potential financial advantages of higher education, and even expanding their personal horizons and experiencing the sheer joy of learning (what a radical thought!). Many of you are in this specific course for one or more of these reasons.

Select a problem which genuinely interests you.

The great commonality between your course work and activities is your exposure to a wealth of information which you would not otherwise experience. That is the primary purpose of taking the time to select a research problem that makes sense to you and that interests you, while at the same time makes a contribution to your specific discipline. The selection of the area in which to work on is extremely important for two reasons. First, research takes a great deal of time and energy, and you want to be sure that the area you select interests you. You will work so hard throughout this project that continuing to work on it, even if it’s the most interesting project, may at times become overwhelming. Just think of what it would be like if you were not interested in the topic!

Second, the area you select is only the first step in the research process. If this goes well, the remaining steps, which are neither more nor less important, also have a good chance of going well.

Just as there are many different ways to go about selecting a research problem, there are also some potential hazards. To start you off on the right foot, the following briefly reviews some of these almost fatal errors.

It is easy to do, but falling in love with your idea can be fatal. This happens when you become so infatuated with an idea and the project and you invest so much energy in it that you cannot bear to change anything about it. Right away someone is going to say, “What’s wrong with being enthusiastic about your project?” My response is a strong, “Nothing at all.” As does your professor, most researchers encourage and look for enthusiasm in students (and scientists) as an important and essential quality. But enthusiasm is not incompatible with being objective and dispassionate about the actual research process (not the content). Sometimes—and this is especially true for beginning research students—researchers see their question as one of such magnitude and importance that they fail to listen to those around them, including their adviser, who is trying to help them formulate their problem in such a way as to make it more precise and, in the long run, easier to address. Be committed to your ideas and enthusiastic about your topic but not so much that it clouds your judgment as to the practical and correct way to do things.

Next, sticking with the first idea that comes to mind isn’t always wise. Every time the 1930s cartoon character Betty Boop had a problem, her inventor grandfather would sit on his stool, cross his legs (taking a Rodin-like pose), and think about a solution. Like a bolt from the blue, the light bulb above his head would go on, and Grampy would exclaim, “I’ve got it!,” but the idea was never exactly right. Another flash would occur, but once again the idea was not perfect. Invariably, it was the third time the light went on that he struck gold.

Do you like your first idea for a research study? Great, but don’t run out and place an advertisement for research participants in the newspaper quite yet. Wait a few days and think about it, and by no means should you stop talking to other students and your adviser during this thinking stage. Second and third ideas are usually much more refined, easier to research, and more manageable than first ones. As you work, rewrite and rethink your work . . . constantly.

Do you want to guarantee an unsuccessful project that excites no one (except perhaps yourself)? Doing something trivial by selecting a problem that has no conceptual basis or apparent importance in the field can lead to a frustrating experience and one that provides no closure. Beginning students who make this mistake sometimes over-intellectualize the importance of their research plans and don’t take the time to ask themselves, “Where does this study fit in with all that has been done before?” Any scientific endeavor has as its highest goal the contribution of information that will help us better to understand the world in general and the specific topic being studied in particular. If you find out what has been done by reading previous studies and use that information as a foundation, then you will surely come up with a research problem of significance and value.

Ah, then there are researchers who bite off more than they can chew. Sound silly? Not to the thousands of advisers who sit day after day in their offices trying to convince well-intentioned beginning students that their ideas are interesting but that (for example) it may be a bit ambitious to ask every third adult in New York City about their attitudes toward increasing taxes to pay for education. Grand schemes are fine, but unless you can reduce a question to a manageable size, you might as well forget about starting your research. If these giant studies by first-timers ever do get done (most of the time they don’t in their original form), the experiences are usually more negative than positive. Sometimes these students end up as ABDs (all but dissertation). Although you may not be seeking a doctorate right now, the lesson is still a good one. Give yourself a break from the beginning—choose a research question that is doable.

Finally, if you do something that has already been done, you could be wasting your time. There is a fine line between what has been done and what is important to do next based on previous work. Part of your job is to learn how to build and elaborate on the results of previous research without duplicating previous efforts. You might remember from the beginning of this chapter that I stressed how replication is an important component of the scientific process and good research. Your adviser can clearly guide you as to what is redundant (doing the exact same thing over without any sound rationale) and what is an important contribution (doing the same thing over but exploring an aspect of the previous research or even asking the same question, while eliminating possibly confounding sources of variance present in the first study).

TEST YOURSELF

Perhaps one of the most interesting dimensions of being a scientist is how the questions they ask are modified as they review the literature and learn more about the topic they are interested in. It’s a constant give and take—hence the importance of being well informed. Ask your advisor or some other faculty how they keep themselves informed in their own field of study.

Defining Your Interests

It is always easy for accomplished researchers to come up with additional ideas for research, but that is what they are paid and trained to do (in part, anyway). Besides, experienced researchers can put all that experience to work for themselves, and one thing (a study) usually leads to another (another study).

Never disregard personal experience as an important source of ideas.

But what about the beginning student such as yourself? Where do you get your ideas for research? Even if you have a burning desire to be an experimental psychologist, a teacher, a counselor, or a clinical social worker, where do you begin to find hints about ideas that you might want to pursue?

In some relatively rare cases, students know from the beginning what they want to select as a research area and what research questions they want to ask. Most students, however, experience more anxiety and doubt than confidence. Before you begin the all-important literature review, first take a look at the following suggestions for where you might find interesting questions that are well worth considering as research topics.

First, personal experiences and firsthand knowledge more often than not can be the catalyst for starting research. For example, perhaps you worked at a summer camp with disabled children and are interested in knowing more about the most effective way to teach these children. Or, through your own personal reading, you have become curious about the aging process and how it affects the learning process. A further example: At least three of my colleagues are special educators because they have siblings who were not offered the special services they needed as children to reach their potential. Your own experiences shape the type of person you are. It would be a shame to ignore your past when considering the general area and content of a research question, even if you cannot see an immediate link between these experiences and possible research activities. Keep reading and you will find ways to help you create that link.

You may want to take complete responsibility for coming up with a research question. On the other hand, there is absolutely nothing wrong with consulting your advisor or some other faculty member who is working on some interesting topic and asking, “What’s next?” Using ideas from your mentor or instructorwill probably make you very current with whatever is happening in your field. Doing so also will help to establish and nurture the important relationship between you and your adviser (or some other faculty member), which is necessary for an enjoyable and successful experience. These are the people doing the research, and it would be surprising not to find that they have more ideas than time to devote to them and would welcome a bright, energetic student (like you) who wants to help extend their research activities.

Next, you might look for a research question that reflects the next step in the research process. Perhaps A, B, and C have already been done, and D is next in line. For example, your special interest might be understanding the lifestyle factors that contribute to heart disease, and you already know that factors such as personality type (for example, Type A and Type B) and health habits (for example, social drinking) have been well studied and their effects well documented. The next logical step might be to look at factors such as work habits (including occupation and attitude) or some component of family life (such as quality of relationships). As with research activities in almost all disciplines and within almost all topics, there is always that next logical step that needs to be taken.

Last, but never least, is that you may have to come up with a research question because of this class. Now that is not all that bad either, if you look at it this way: People who come up with ideas on their own are all set and need not worry about coming up with an idea by the deadline. Those people who have trouble formulating ideas need a deadline; otherwise, they would not get anything done. So although there are loftier reasons for coming up with research questions, sometimes it is just required. Even so, you need to work very hard at selecting a topic that you can formulate as a research question so that your interest is held throughout the duration of the activity.

TEST YOURSELF

You’d be surprised how many important scientific breakthroughs were the result of informal talk (aka “bull”) sessions between people interested in the same or similar topics. Just sitting around and talking about ideas is one of the great pleasures when it comes to learning and scientific discovery. Be a bit creative and list five ideas you have or questions you find particularly interesting about any topic. Don’t worry at this point how you would answer the question but take a few intellectual risks and see what you come up with.

Ideas, Ideas, Ideas (and What to Do with Them)

Even if you are sure of what your interest might be, sometimes it is still difficult to come up with a specific idea for a research project. For better or worse, you are really the only one who can do this for yourself, but the following is a list of possible research topics, one of which might strike a chord. For each of these topics, there is a wealth of associated literature. If one topic piques your interest, go to that body of literature (described in the second part of this chapter) and start reading.

  •  aggression

AIDS

autism spectrum disorder

bilingual education

biofeedback

biology of memory

birth control

body image

central nervous system

child care

children of war

circadian rhythms

classical conditioning

cognitive development

color vision

competition

compliance

computer applications

conflict

cooperative learning

creativity

delusions

depression

déjà vu

development of drawing

diets

divorce

dreams

drug abuse

early intervention

egocentrism

elder care

endocrine system

epilepsy

ethics

exercise

fat

fetal alcohol syndrome

fluid intelligence

gender differences

Head Start

homeschooling

identity

imagery

intelligence

language development

learning disabilities

mediation

memory

menarche

mental sets

middle adulthood

motivation

narcolepsy

neural development

nightmares

nutrition

obesity

optimism

pain

parenting

perception

prejudice

public policy

racial integration

reinforcement

relaxation

REM sleep

self-esteem

violence in schools

From Idea to Research Question to Hypothesis

Once you have determined what your specific interest might be, you should move as quickly as possible to formulate a research question that you want to investigate and begin your review of literature.

Research ideas lead the way to hypotheses.

There is a significant difference between your expressing an interest in a particular idea and the statement of a research question. Ideas are full of those products of luxurious thinking: beliefs, conceptions, suppositions, assumptions, what if’s, guesses, and more. Research questions are the articulation, best done in writing, of those ideas that at the least imply a relationship between variables. Why is it best done in writing? Because it is too easy to “get away” with spoken words. It is only when you have to write things down and live with them (spoken words seem to vanish mysteriously) that you face up to what has been said, make a commitment, and work to make sense out of the statement.

Unlike a hypothesis, a research question is not a declarative statement but rather is a clearly stated expression of interest and intent. In the pay-me-now or pay-me-later tradition, the more easily understood and clearer the research question, the easier your statement of a hypothesis and review of the literature will be. Why? From the beginning, a clear idea of what you want to do allows you to make much more efficient use of your time when it comes to searching for references and doing other literature review activities.

Finally, it is time to formulate a hypothesis or a set of hypotheses that reflects the research question. Remember from Chapter 2 the set of five criteria that applies to the statement of any hypothesis? To refresh your memory, here they are again. A well-written hypothesis

  • 1. is stated in declarative form

  • 2. posits a relationship between variables

  • 3. reflects a theory or body of literature upon which it is based

  • 4. is brief and to the point

  • 5. is testable

When you derive your hypothesis from the research question, you should look to these criteria as a test of whether what you are saying is easily communicated to others and easily understood. Remember, the sources for ideas can be anything from a passage that you read in a novel last night to your own unique and creative thoughts. When you get to the research question stage, however, you need to be more scientific and clearly state what your interest is and what variables you will consider.

Table 3A.1 Research ideas and questions and the hypothesis that reflect them

Research Interest or Ideas

Research Problem or Questions

Research Hypothesis

Open Classroom and Academic Success

What is the effect of open versus traditional classrooms on reading level?

Children who are taught reading in open classroom settings will read at a higher grade level than children who are taught reading in a traditional setting.

Television and Consumer Behavior

How does watching television commercials affect the buying behavior of adolescents?

Adolescent boys buy more of the products advertised on television than do adolescent girls.

Effectiveness of Checklists in Preventing Hospital Infections

Does the use of checklists when preparing patients for surgery help reduce the level of infection in the hospital?

Those hospitals that regularly use checklists in patient preparation for surgery will have a lower rate of infection per 1,000 patients then these hospitals, which do not.

Food Preference and Organic Foods

Do consumers prefer food that i s organ i c?

There will be a difference in preference level as measured by the I ♥ Food scale between those consumers who are offered organic food and those who are offered non-organic food.

Use of Energy by Home Owners

Will a home owners’ energy usage change as a function of his or her knowledge of his or her neighbor’s usage?

Those people who know how much energy their neighbors use on a monthly basis will use less energy.

Adult Care

How have many adults adjusted to the responsibility of caring for their aged parents?

The number of children who are caring for their parents in the child’s own home has increased over the past 10 years.

Table 3A.1 lists five research interests, the research questions that were generated from those ideas, and the final hypotheses. These hypotheses are only final in the sense that they more or less fit the five criteria for a well-written hypothesis. Your literature review and more detailed discussion may mean that variables must be further defined and perhaps even that new ones will need to be introduced. A good hypothesis tells what you are going to do, not how you will do it. TEST YOURSELF

As Pasteur said, chance does favor the prepared mind and you will never know where the best information will come from. So, even if some class seems to contain material unrelated to your specialty or your interests, you never know what insight you might gain from reading widely and discussing ideas with your fellow students. What five things might you read (that you have not) that are related to your interests?

Reviewing the Literature

Here it comes again. Today’s research is built on a foundation of the hard work and dedication of past researchers and their productive efforts. Where does one find the actual results of these efforts? Look to scholarly journals and books and other resources, which are located in the library and online.

The review of literature provides a framework for the research proposal.

Although all stages in the research process are important, a logical and systematic review of the literature often sets the stage for the completion of a successful research proposal and a successful study. Remember one of the fatal mistakes mentioned at the beginning of the chapter about selecting a research question that has been done before? Or one that is trivial? You find out about all these things and more when you see what has already been done and how it has been done. A complete review provides a framework within which you can answer the important question(s) that you pose. A review takes you chronologically through the development of ideas, shows you how some ideas were left by the wayside because of lack of support, and tells you how some were confirmed as being truths. An extensive and complete review of the literature gives you that important perspective to see what has been done and where you are going—crucial to a well-written, well-documented, well-planned report.

So get out your yellow (or recyclable white) writing pads, index cards, pens or pencils, laptop computer, or iPad and let’s get started. Also, don’t forget your school ID card so you can check out books at the campus library.

The literature review process consists of the steps listed in Figure 3A.2. You begin with as clear an idea as possible about what you want to do, in the form either of a clear and general statement about the variables you want to study or of a research hypothesis. You should end with a well-written, concise document that details the rationale for why you chose the topic you did, how it fits into what has been done before, what needs to be done in the future, and what is its relative importance to the discipline.

There are basically three types of sources that you will consult throughout your review of the literature (see Table 3A.2). The first are general sources, which provide clues about the location of references of a general nature on a topic. Such sources certainly have their limitations (which we will get to in a moment), but they can be a real asset because they provide a general overview of, and introduction to, a topic.

Figure 3A.2 The steps in reviewing the literature. It is a formidable task, but when broken down step by step, it is well within your reach. Table 3A.2 What different sources can do for you in your search for relevant material about an interesting research question

Information Source

What It Does

Examples

General Sources

Provides an overview of a topic and provides leads to where more information can be found

Daily newspaper, news weeklies, popular periodicals and magazines, trade books, Reader’s Guide to Periodical Literature, New York Times Index

Secondary Sources

Provides a level of information “Once removed” from the original work

Books on specific subjects and reviews of research

Primary Sources

The original reports of the original work or experience

Journals, abstracts and scholarly books, Educational Resources Information Center (ERIC), movies

For example, let’s say you are interested in the general area of sports psychology but have absolutely no idea where to turn to find more information. You could start with a recent article that appeared in the New York Times (a general source) and find the name of the foremost sports psychologist and then go to more detailed secondary or primary sources to find out more about that person’s work.

The second source type, secondary sources, are “once removed” from the actual research. These include review papers, anthologies of readings, syntheses of other work in the area, textbooks, and encyclopedias.

General, secondary, and primary resources are all important, but very different, parts of the literature review.

Finally, the most important sources are primary sources. These are accounts of the actual research that has been done. They appear as journal articles or as other original works including abstracts. Table 3A.2summarizes the functions of general, secondary, and primary resources and provides some examples. These three different types of sources are also covered in Chapter 9 in a discussion of historical methods of doing research.

Before you get started, let me share my own particular bias. There is no substitute for using every resource that your library has to offer, and that means spending lots of time turning the pages of books and journals and reading their contents. In many cases, however, there’s no substitute for exploring and using electronic resources such as online databases. You’ll learn about both printed and electronic resources here, but you should remember that you won’t find everything you need online (and much of it is not verifiable), yet online is where the most recent material appears. There is even material now being posted online that will not show up in the library—a new and very interesting development owing to the appearance of online (only) journals and e-books. However, at least for now, begin developing your library as well as your online skills. The online world of literature may someday be the only world of literature, but that surely will not be the case this semester.

One last note before we get started. Your university has an absolute ton of online resources available to you and probably more than you can imagine. How do you find out what might be available? Well, you can access your library online and find out, or follow these steps:

  • 1. Go to any of the libraries on your campus.

  • 2. Ask for where the reference librarians sit.

  • 3. Ask one for a short tour of what’s online (or enroll in one of many classes that most libraries offer at the beginning of the semester to address these skills especially).

One of the best kept secrets on any college campus is how smart and resourceful reference librarians are. Reference librarians are the original search engines. Get to know them (individually)—it will serve you very well.

Using General Sources

General sources of information provide two things: (1) a general introduction to areas in which you might be interested and (2) some clues as to where you should go for the more valuable or useful (in a scientific sense, anyway) information about your topic. They also provide great browsing material.

Any of the references discussed below, especially the indices of national newspapers and so on, can offer you 5, 10, or 50 articles in a specific area. In these articles, you will often find a nice introduction to the subject area and a mention of some of the people doing the research and where they are located. From there, you can look through other reference materials to find out what other work that person has done or even how to contact that person directly.

There are loads of general sources in your college or university library as well as in your local public library and online as well. The following is a brief description of just a few of the most frequently used sources and a listing of several others you might want to consult. Remember to use general sources only to orient yourself to what is out there and to familiarize yourself with the topic.

All of what follows can be accessed online, but the URL (or the Uniform Resource Locator) will differ since you may be accessing it through your university or college.

Readers’ Guide, Full Text Mega Edition is by far the most comprehensive available guide to general literature. Organized by topic, it is published monthly, covering hundreds of journals (such as the New England Journal of Medicine) and periodicals or magazines (such as Scientific American). Because the topics are listed alphabetically, you are sure to find reading sources on a selected topic easily and quickly.

New to the Readers’ Guide world is now the Readers’ Guide Retrospective, which allows access to more than 100 years of coverage from 375 U.S. magazines with indexing of leading magazines back as far as 1890 and citations to more than 3,000,000 articles. If you can’t find something about your interests or related topics here, it’s time to reassess the topic you want to focus on.

Another very valuable source of information is the Facts on File Online Databases with content first published in New York in 1940. Facts on File presents a collection of databases that include tens of thousands of articles and other resources (such as video and audio files) in a multitude of areas. The following list shows you what just some of these databases are and a brief description of each:

  • • U.S. Government Online presents in-depth information on the structure and function of the U.S. government.

  • • American History Online covers more than 500 years of political, military, social, and cultural history.

  • • African-American History Online provides cross-referenced entries, covering African American history.

  • • Curriculum Video on Demand provides a video subscription to more than 5,000 educational programs.

  • • Science Online contains information on a broad range of scientific disciplines.

  • • Ferguson’s Career Guidance Center provides profiles of more than 3,300 jobs and 94 industries.

  • • Bloom’s Literary Reference Online contains information on thousands of authors and their works, including an archive of 38,000 characters.

  • • And, the grandparent of them all, Facts On File World News Digest, which is the standard resource for information on U.S. and world events

The New York Times Index lists by subject all the articles published in the Times since 1851. Once you find reference to an article that might be of interest, you then go to the stacks and select a copy of the actual issue or view it on microfilm. The originals are seldom available because they are printed on thin paper which was designed to hold up only for the few days that a newspaper might be read.

Instead, contents are recorded on microfilm or some other medium and are available through your library. Many libraries now have microfilm readers that allow you to copy directly from the microfilm image and make a print or hardcopy of what you are viewing. The full text of many newspapers is also now available electronically (discussed later in this chapter). And, although the index is not available online, you can search through the archives of the New York Times online at http://www.nytimes.com—most articles are free to access, but as of this writing, future users are likely to be charged (unless you subscribe to the print edition individually) and of course, probably still free at your local library or institution’s libraries.

Nobody should take what is printed as the absolute truth, but weekly news magazines such as Time(http://www.time.com/time/), Newsweek (http://www.msnbc.msn.com/id/3032542/site/newsweek/), and U.S. News and World Report (http://www.usnews.com/) offer general information and keep you well informed about other related events as well. You may not even know that you have an interest in a particular topic (such as ethical questions in research), but a story on that topic might be in a current issue, catch your eye, and before you know it you will be using that information to seek out other sources.

There are two other very comprehensive electronic general source databases: Lexis/Nexis Academic (there are other versions as well) and the Expanded Academic ASAP, both of which are probably available online through your library.

Lexis/Nexis Academic is the premier database. It is absolutely huge in its coverage and contains information on current events, sports, business, economics, law, taxes, and many other areas. It offers full text of selected newspaper articles. Figure 3A.3 shows you the results of a search on the general term “school finance.” You can print this information, e-mail it (to yourself of course if you are in the library and have no other way to record it), and sort in various ways (such as by date).

Figure 3A.3 The results of a simple LexisNexis search.

Copyright © 2007. Reprinted with the permission of LexisNexis, a division of Reed Elsevier Inc. All rights reserved. LexisNexis and the Knowledge Burst logo are registered trademarks of Reed Elsevier Properties Inc. used with the permission of LexisNexis.

The Expanded Academic ASAP is a multidisciplinary database for research, which provides information on disciplines such as the humanities, communication studies, social science, the arts, science, technology and many more disciplines. It covers from 1980 to the present and contains well over 18 million articles.

As the electronic world of resources and reference tools charges long, Google has shown its value in at least two different ways.

First there is Google Scholar, which provides a tool to broadly search for scholarly literature. You can search across disciplines and sources to find articles and books (and other types of publications such as abstracts) and it is the ideal way to locate works by a particular scholar. For example, if you are interested in learning more about what Ron Haskins, a noted expert on policy and families, has done, go to Google Scholar and search; you’ll find works completed by Professor Haskins as well as works in which he is referenced. What a terrific help!

Also, there is Google Books, where Google has undertaken the process of digitizing and making available for no charge millions of books in libraries and other institutions around the world. In Google Books, you can find everything from a limited preview of a book you need for class or the full text of other books that may, or may not, be in the public domain.

Google Books is an absolutely invaluable tool for any researcher, but its use is not without controversy. After all, an author’s work is appearing with no charge to the user and no benefit to the author (such as a royalty payment). The years to come will sort out how tools such as Google Books can be used and still be fair to the author as well as to the researchers.

Then, there is the wealth of information you can dig out of everyday sources such as your local newspaper, company newsletters, and other publications. Thousands of newspapers can be accessed through http://www.newspapers.com, and newspapers often carry the same Associated Press articles as major papers such as the New York Times and the Washington Post. And, please, do not forget the U.S. Government Printing Office (GPO), which regularly publishes thousands of documents on everything from baseball to bees, and the majority of these documents are free—your tax dollars at work. Do you want to know more about the GPO? Visit http://www.gpo.gov for a catalog of what is available.

Finally, there’s the hugely popular and successful Wikipedia (at http://www.wikipedia.org/), an encyclopedia that is almost solely based on the contributions of folks like you and me. At this writing, Wikipedia contains over 3,000,000 articles on absolutely everything you can think of. This may be the perfect online place to start your investigations.

Trustworthy? To a great extent, yes. Wikipedia is monitored by content experts, and a recent study found that the venerable Encyclopedia Britannica had more factual errors than did Wikipedia. And, of course, the great thing about any wiki (and it is a general term for anything built on the contributions of many people and open for editing by anyone as well) is that the facts, if incorrect initially, will surely be changed or modified.

The Wikipedia site also contains other wikis, including Wiktionary (a dictionary), Wikinews, Wikiquotes, and more. Just exploring the encyclopedia and these ancillaries is fun.

Finally, one especially useful source that you should not overlook is The Statistical Abstract of the United States, published yearly by the U. S. Department of Commerce (http://www.census.gov/statab/www). This national database about the United States includes valuable, easily accessible information on demographics and much more.

Using Secondary Sources

Secondary sources are those that you seek out if you are looking for a scholarly summary of the research that has been done in a particular area or if you are looking for further sources of references.

Major syntheses of information such as reviews can be a terrific foundation for your review.

Reviews and Syntheses of Literature

These are the BIG books you often find in the reference section of the library (not in the stacks). Because so many people want to use them, they must always be available. The following is a summary of some of the most useful. More and more of these books are being published as encyclopedias.

A general secondary source of literature reviews is the Annual Reviews (published since 1930 by Annual Reviews in about 40 disciplines), containing about 20 chapters and focusing on a wide variety of topics such as medicine, anthropology, neuroscience, biomedical engineering, political science, psychology, public health, and sociology. Just think of it—you can go through the past 10 years of these volumes and be very up-to-date on a wide range of general topics. If you happen to find one chapter on exactly what you want to do, you are way ahead of the game. You can find out more about these volumes and see sample tables of contents at http://www.annualreviews.org/.

Another annual review that is well worth considering is the National Society for the Study of Education(or NSSE) Yearbooks (also available at http://nsse-chicago.org). Each year since 1900, this society has published a two-volume annual that focuses on a particular topic, such as adolescence, microcomputers in the classroom, gifted and talented children, and classroom management. The area of focus is usually some contemporary topic, and if you are interested in what is being covered, the information can be invaluable to you. The 2009 yearbook has as its focus “localism.”

The Condition of Education in Brief 2007 (available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2007066) contains a summary of 20 of the 48 indicators in the Condition of Education 2007, including public and private enrollment in elementary/secondary education, the racial/ethnic distribution of public school students, students’ gains in reading and mathematics achievement through third grade, trends in student achievement from the National Assessment of Education Progress in reading and mathematics, international comparisons of mathematics literacy, annual earnings of young adults by education and race/ethnicity, status dropout rates, immediate transition to college, availability of advanced courses in high school, inclusion of students with disabilities in regular classrooms, school violence and safety, faculty salary and total compensation, early development of children, expenditures per student in elementary and secondary education, and public effort to fund postsecondary education. The files are available for downloading.

If you are interested in child development, seek out the Handbook of Child Psychology (Wiley 2006), which is often used as the starting point (for ideas) by developmental and child psychology students, early childhood education students, medical and nursing students, and many others. The four individual volumes are

  • • Volume 1: Theoretical Models of Human Development

  • • Volume 2: Cognition, Perception and Language

  • • Volume 3: Social, Emotional and Personality Development

  • • Volume 4: Child Psychology in Practice

Finally, there’s the eight-volume Encyclopedia of Psychology from Oxford University Press (2000), which includes 1,500 articles on every aspect of psychology.

Also, do not forget the large number of scholarly books that sometimes have multiple authors and are edited by one individual or that are written entirely by one person (which, in the latter case, is sometimes considered a primary resource, depending on its content). Use the good old card catalog (or your library’s computerized search system) to find the title or author you need.

Using Primary Sources

Primary sources are the meat and potatoes of the literature review. Although you will get some good ideas and a good deal of information from reading the secondary sources, you have to go to the real thing to get the specific information to support your points and make them stick.

In fact, your best bet is to include mostly primary sources in your literature review, with some secondary sources to help make your case. Do not even think about including general sources. It is not that the information in Redbook or the New Jersey Star Ledger is not useful or valuable. That information is secondhand, however, and you do not want to build an argument based on someone else’s interpretation of an idea or concept.

Get to know your library and where you can find journals related to your field of study. Most libraries offer tours on a regular basis.

Journals

Journals? You want journals? Table 3A.3 lists journals arranged by category. This should be enough for you to answer your professor when he asks, “Who can tell me some of the important journals in your own field?” This list is only a small selection of what is available. The print version of Ulrich’s Periodicals Directory (first published in 1932) lists information on thousands of periodicals, including journals, consumer magazines, and trade publications (at http://www.ulrichsweb.com/ and at your library as well).

Table 3A.3 A sample of the thousands of journals being published in all different fields

Psychology

Adolescence

American Journal of Family Therapy

American Journal of Orthopsychiatry

American Psychologist

Behavioral Disorders

Child Development

Child Study Journal

Developmental Psychology

Contemporary Educational Psychology

Educational and Psychological Measurement

Journal of Abnormal Child Psychology

Journal of Applied Behavioral Analysis

Journal of Autism and Development Disorders

Journal of Child Psychology and Psychiatry and Allied Disciplines

Journal of Consulting and Clinical Psychology

Journal of Counseling Psychology

Journal of Educational Psychology

Journal of Experimental Child Psychology

Journal of Experimental Psychology, Human Perception and Performance

Journal of Experimental Psychology, Learning, Memory, and Cognition

Journal of Genetic Psychology

Journal of Humanistic Psychology

Journal of Personality and Social Psychology

Journal of Psychology

Journal of Research in Personality

Journal of School Psychology

Perceptual and Motor Skills

Psychological Bulletin

Psychological Review

Psychology in the Schools

Psychology of Women Quarterly

Small Group Behavior

Transactional Analysis Journal

Special Educational and Exceptional Children

Academic Therapy

American Annals of the Deaf

American Journal of Mental Deficiency

Behavioral Disorders

Education and Training of the Mentally Retarded

Journal of Learning Disabilities

Journal of Mental Deficiency Research

Journal of Special Education

Journal of Special Education Technology

Journal of Speech and Hearing Disorders

Education of the Visually Handicapped

Exceptional Children

Exceptional Education Quarterly

Exceptional Parent

Gifted Child Quarterly

Hearing and Speech Action

International Journal for the Education of the Blind

Journal for the Education of the Gifted

Journal of The Association for the Severely Handicapped

Journal of Speech and Hearing Research

Journal of Visual Impairment and Blindness

Learning Disability Quarterly

Mental Retardation

Sightsaving Review

Teaching Exceptional Children

Teacher Education and Special Education

Teacher of the Blind

Topics in Early Childhood Special Education Volta Review

Health and Physical Education

Journal of Health Education

Journal of Alcohol and Drug Education

Journal of Leisure Research

Journal of Motor Learning

Journal of Nutrition Education

Journal of Outdoor Education

Journal of Physical Education, Recreation and Dance

Journal of School Health

Journal of Sport Health

Physical Educator

Research Quarterly of the American Alliance for Health, Physical Education, Recreation and Dance

School Health Review

Child Psychology

Adolescence American Journal of Family Therapy

American Journal of Orthopsychiatry

Child Study Journal

Contemporary Educational Psychology

Developmental Psychology

Educational and Psychological Measurement

Journal of Abnormal Child Psychology

Journal of Applied Behavioral Analysis

Journal of Autism and Development Disorders

Journal of Child Psychology and Psychiatry and Allied Disciplines

Journal of Consulting and Clinical Psychology

Journal of Counseling Psychology

Journal of Educational Psychology

Journal of Experimental Child Psychology

Journal of Experimental Psychology, Human Perception and Performance

American Psychologist Behavioral Disorders

Child Development

Journal of Experimental Psychology, Learning,

Memory, and Cognition

Journal of Genetic Psychology

Journal of Humanistic Psychology

Journal of Personality and Social Psychology

Journal of Psychology

Journal of Research in Personality

Journal of School Psychology

Perceptual and Motor Skills

Psychological Bulletin

Psychological Review

Psychology in the Schools

Psychology of Women Quarterly

Small Group Behavior

Transactional Analysis Journal

Special Education and Exceptional Children

Academic Therapy

American Annals of the Deaf

American Journal of Mental Deficiency

Behavioral Disorders

Education and Training of the Mentally Retarded

Education of the Visually Handicapped

Exceptional Children

Exceptional Education Quarterly

Exceptional Parent

Gifted Child Quarterly

Journal of Learning Disabilities

Journal of Mental Deficiency Research

Journal of Special Education

Journal of Special Education Technology

Journal of Speech and Hearing Disorders

Journal of Speech and Hearing Research

Journal of Visual Impairment and Blindness

Learning Disability Quarterly

Mental Retardation Sightsaving Review

Hearing and Speech Action

International Journal for the Education of the Blind

Journal for the Education of the Gifted

Journal of The Association for the Severely

Handicapped

Teaching Exceptional Children

Teacher Education and Special Education

Teacher of the Blind

Topics in Early Childhood Special Education

Volta Review

Health and Physical Education

Journal of Alcohol and Drug Education

Journal of Health Education

Journal of Leisure Research

Journal of Motor Learning

Journal of Nutrition Education

Journal of Outdoor Education

Journal of Physical Education, Recreation and Dance

Journal of School Health

Journal of Sport Health

Library Research

Lifelong Learning:The Adult Years

Mathematics and Computer Education

Mathematics Teacher

Modern Language Journal

Music Education Journal

National Education Association Research Bulletin

Studies in Art Education

Studies in Educational Evaluation

Teachers College Record

Theory and Research in School Education

National Elementary Principal

Negro Education Review

Peabody Journal of Education

Phi Delta Kappan

Physical Educator

Research Quarterly of the American Alliance for Health, Physical Education, Recreation and Dance

Review of Educational Research

School Health Review

School Library Media Quarterly

School Psychology Review

School Science Review

Science and Children

Science Education

Science Teacher

Secondary School Theatre Journal

Social Education

Theory Into Practice

Today’s Education

Voc Ed

Young Children

Sociology and Anthropology

American Anthropologist

American Behavioral Scientist

American Journal of Sociology

American Sociological Review

Anthropology and Education Quarterly

Child Welfare

Family Relations

Group and Organization Studies

Human Organization

Human Services in the Rural Environment

Journal of Correctional Education

Journal of Marriage and the Family

Rural Sociology

Sex Roles: A Journal of Research

Social Work

Sociology and Social Research

Sociology of Education

Urban Anthropology

Urban Education

Urban Review

Youth and Society

Analytical Research

Administration and Society

American Historical Review

American Political Science Review

Annals of the American Academy of Political and

Social Science

Civil Liberties Law

Comparative Education Review

Daedalus

Economics of Education Review

Education and Urban Society

Education Forum

Educational Studies

Educational Theory

Harvard Civil Rights

Nursing

AACN Advanced Critical Care

Advanced Emergency Nursing Journal

Advances in Neonatal Care

Advances in Nursing Science - Featured Journal

Advances in Skin & Wound Care:The Journal for

Prevention and Healing

AJN, American Journal of Nursing

Alzheimer’s Care Today

Cancer Nursing

CIN: Computers, Informatics, Nursing

Critical Care Nursing Quarterly

Dimensions of Critical Care Nursing

Family & Community Health

Gastroenterology Nursing

Health Care Food & Nutrition Focus

Health Care Management Review

The Health Care Manager

Holistic Nursing Practice

Home Healthcare Nurse

Infants & Young Children

Journal for Nurses in Staff Development

Journal of Ambulatory Care Management

Journal of Cardiovascular Nursing

Journal of Christian Nursing

Journal of Head Trauma Rehabilitation

Journal of Hospice and Palliative Nursing

Journal of Infusion Nursing

Journal of Neuroscience Nursing

Journal of Nursing Care Quality

The Journal of Nursing Research

Journal of Perinatal and Neonatal Nursing

Journal of Public Health Management & Practice

Journal of the Dermatology Nurses’ Association

Journal of Trauma Nursing

MCN, The American Journal of Maternal/Child

Nursing

Men in Nursing

Nurse Educator

Nursing 2010

Nursing 2010 Critical Care

Nursing Administration Quarterly

Nursing Made Incredibly Easy!

Nursing Management

Nursing Research

Nutrition Today

Oncology Times

OR Nurse 2010

Orthopaedic Nursing

Outcomes Management

Plastic Surgical Nursing

Professional Case Management

Quality Management in Health Care

Journals are by far the most important and valuable primary sources of information about a topic because they represent the most direct link among the researcher, the work of other researchers, and your own interests.

What actually is a journal, and how do papers or manuscripts appear? A journal is a collection (most often) of research articles published in a particular discipline. For example, the American Educational Research Association (AERA) publishes more than six journals, all of which deal with the general area of research in education. The American Psychological Association (APA) publishes many journals including the Journal of Experimental Psychology and the Journal of Counseling Psychology. The Society for Research in Child Development (SRCD) publishes Child Development and Child Development Monographs,among others. Membership in these professional groups entitles you to a subscription to the journals as part of the package, or you can subscribe separately.

Most often, these professional organizations do not do the actual publishing themselves, but only the editorial work where the manuscripts are reviewed and considered for publication. For example, Child Development, sponsored by the SRCD, is published by Wiley Publishers/ Blackwell.

How do most respectable journals work? First, a researcher writes an article within the province of the particular journal to which it is being submitted. The manuscript is prepared according to a specific format (such as the one shown in Chapter 14), and then usually three copies are submitted to the journal editor. Guidelines for preparing manuscripts are usually found on the front or back covers of most journals in the social and behavioral sciences. Often the journal requires that the author follow guidelines stated in the sixth edition of the American Psychological Association Publication Manual (2009).

The peer review process of reviewing journal submissions ensures that experts review and comment on a research manuscript before it is published.

Second, once the article has been received by the editor, who is an acknowledged expert in that particular field, the article is sent to at least three reviewers who are also experts in the field. Note that today, almost all of the submission and review process occurs online. These reviewers participate in a process known as peer review, in which the reviewers do not know the identity of the author (or authors) of the paper. The author’s name appears only on a cover sheet, which is removed by the editor. A social security number, or some other coded number, is used for identification on the rest of the manuscript. This makes the process quite fair (what is called “blind”)—the reviewer’s chance of knowing the identity of the author is greatly reduced, if not eliminated. The possibility that personalities might get in the way of what can be a highly competitive goal—publishing in the best journals—is thus minimized. Each reviewer makes a recommendation regarding suitability for publication. The options from which the reviewers select can include

  • • Accept outright, meaning that the article is outstanding and can be accepted for publication as is

  • • Accept with revisions, meaning that some changes need to be made by the author(s) before it is accepted (and is of course reviewed again)

  • • Reject with suggestions for revisions, meaning that the article is not acceptable as is, but after changes are made the author(s) should be invited to resubmit it

  • • Reject outright, meaning that the article is completely unacceptable and is not welcome for resubmission

Finally, when a consensus is reached by the reviewers, the editor of the journal conveys that decision to the author(s). If a consensus cannot be reached, the editor makes a decision or sends the article to another reviewer for additional comments. Editors work very hard to ensure that the review process and the journal publication process are fair.

By the way, you might be interested to know that the average rejection rate for the top journals is about 80%. Yes, 80% of the articles submitted never get in, but those rejected by the top journals usually find their way into other journals. Just because these articles are not accepted by the journals with the highest rejection rate does not mean they cannot make a significant contribution to the field. In fact, several studies have shown that there is little consistency among reviewers, and what one might rank high, another might rank quite low. However, in general, it’s safe to say that the better scientific reports are published by the better journals.

One more note about primary sources in general. If you know of a journal or a book that you might need and your library does not have it (and it is not available online), do not despair. First, check other libraries within driving distance or check with some of the professors in your department. They might have it available for loan. If all else fails, use the interlibrary loan system, with which your reference librarian will be glad to help you. This service helps you locate and physically secure the reference materials you want for a limited amount of time from another library. The system usually works quickly and is efficient.

Abstracts

If journals are the workhorses of the literature review, then collections of abstracts cannot be very far behind with regard to their convenience and usefulness. An abstract is a one- (or at most two-) paragraph summary of a journal article which contains all the information readers should need to decide whether to read the entire journal article.

Abstracts help you save the time it would take to locate potentially important sources of information.

By perusing collections of abstracts, researchers can save a significant amount of time compared with leafing through the journals from which these abstracts are drawn. Most abstracts also include subject and author indexes to help readers find what they are looking for, and abstracts of articles routinely appear in more than one abstract resource.

For example, a study on the benefits of long-distance learning might appear in PsychINFO from the International Journal of Simulation and Process Modelling.

The following is a brief description of some abstract collections you might find useful.

One well-known collection of abstracts is PsycINFO (at http://www.apa.org/pubs/databases/psycinfo/index.aspx). PsycINFO (for members of APA) and PsycINFO Direct (for nonmembers) provide an electronic database that contains abstracts and summaries of psychological literature from the 1800s to the present. Some facts about PsycINFO: It contains articles and abstracts from more than 2,500 journals, is updated weekly, offers chapters from scholarly books, contains material from 49 different countries, covers dissertations, and much more. No doubt—on your research travels, it is a great resource.

There is an unlimited amount of information in PsycINFO, and the online nature enables you to search electronically. Figure 3A.4 shows you a sample PsycINFO screen for a journal article. Screens for books and chapters and dissertations look quite similar.

One other way to use PsycINFO is to look up the key word “bibliography.” Under this heading, you will find a list of bibliographies that have already been published. You might be lucky and find one that focuses on your area of interest.

One index that is especially useful is Educational Resources Information Center, or ERIC. ERIC (http://www.eric.ed.gov/) is a nationwide information network that acquires, catalogs, summarizes, and provides access to education information from all sources. It currently contains more than 1.3 million education-related documents and adds about 30,000 per year. The database and ERIC document collections are housed in about 3,000 locations worldwide, including most major public and university library systems.

Figure 3A.4 The results of a PsycINFO search. Screenshot is reprinted with permission of the American Psychological Association, publisher of the PsycINFO database, All rights reserved.

ERIC produces a variety of publications and provides extensive user assistance with several different ways to search the database. As with PsycINFO, the ERIC system works with a set of descriptive terms found in a thesaurus, the Thesaurus of ERIC Descriptors (see Figure 3A.5), which should be your first stop. Once you find the search words or descriptors, you can use the subject index (published monthly) until you find the number of a reference focusing on what you want. Finally, you are off to the actual description of the reference, as you see in Figure 3A.6. Most of the time, these ERIC documents are in PDF (portable document format) and you can access the entire document. Other times, although rare, you may have to order directly from the ERIC clearinghouse. If your library has a government documents department, it might already have the document on hand. Also, you might be able to contact the original author as listed in the résumé.

ERIC has been in business since 1966 and has regional clearinghouses that archive, abstract, and disseminate educational articles and documents. Education is broadly defined, so many disciplines in the social and behavioral sciences are covered quite adequately.

Do you think that this is enough to get started? PsycINFO and the ERIC sets of abstracts are major resources, but there are others that are a bit more specialized and also very useful.

Figure 3A.5 The set of ERIC terms in the thesaurus you start with when conducting an ERIC search.

Figure 3A.6 Once you have identified areas through the ERIC thesaurus, it’s time to turn to key words that produce ERIC entries like these.

Titles of other abstracts, such as Sociological Abstracts, Exceptional Child Education Resources, Research Related to Children, and Dissertation Abstracts, reveal the wide variety of available reference material.

Finally, there’s ProQuest Dissertations and Theses (which replaces Dissertation Abstracts at many libraries) at http://www.proquest.com/en-US/catalogs/databases/detail/pqdt.shtml, which contains over 2.7 million dissertations and theses dating from 1861, with full text online from 1997. More than 75,000 new entries are added every year and it contains the abstracts of over 2,000,000 dissertations from 1861 to the present, in the following areas:

  • • Agriculture

  • • Astronomy

  • • Biological and Environmental Sciences

  • • Business and Economics

  • • Chemistry

  • • Education

  • • Engineering

  • • Fine Arts and Music

  • • Geography and Regional Planning

  • • Geology

  • • Health Sciences

  • • History and Political Science

  • • Language and Literature

  • • Library and Information Science

  • • Mathematics and Statistics

  • • Philosophy and Religion

  • • Physics

  • • Psychology and Sociology

Indices

Journals and abstracts provide the substance of an article, a conference presentation, or a report. If you want a quick overview of where things might be located, turn to an index, which is an alphabetical listing of entries by topic, author, or both.

The widely used and popular Social Sciences Citation Index (SSCI) and Science Citation Index (SCI) work in an interesting and creative way. SSCI (at http://thomson-reuters.com/products_services/science/science_products/a-z/social_sciences_citation_index) provides access to bibliographic information, author abstracts, and citations from more than 2,400 journals in more than 50 disciplines. SCI (now part of the Web of Science at http://thomsonreuters.com/products_services/science/science_products/a-z/science_citation_index) provides researchers access to over 3,700 scientific and technical journals across 100 disciplines.

Indices help you locate the sources of important information.

Let’s say you read an article that you find to be very relevant to your research proposal and want to know what else the author has done. You might want to search by subject through abstracts, as we have talked about, but you might also want to find other articles by the same author or on the same general topic. Tools like SSCI and SCI allow you to focus on your specific topic and access as much of the available information as possible. For example, do you want to find out who has mentioned the classic article “Mental and Physical Traits of a Thousand Gifted Children,” written by Louis Terman and published in 1925? Look up Terman, L., in SSCI year by year, and you will find more references than you may know what to do with.

Finally, you can consult the Bibliographic Index Plus online (at http://www.hwwilson.com/Databases/biblio.htm), a compilation of bibliographies that results from a search of more than 530,000 bibliographies. Just think of the time you can save if you locate a relatively recent bibliography on what interests you.

TEST YOURSELF

What is the best use to which you can put a general, secondary, and primary source and name one of each which you might use in better understanding the most important questions in your own field of study?

Reading and Evaluating Research

Almost any research activity that you participate in involves reading research articles that appear in journals and textbooks. In fact, one of the most common faults of beginning researchers is not being sufficiently familiar with the wealth of research reports available in their specific area of interest. It is indeed rare to find a research topic about which nothing (or nothing related) has been done. You may not be able to find something that addresses the exact topic you wish to pursue (such as changes in adolescent behavior in Australian children who live in the outback), but there is plenty of information on adolescent behavior and plenty on children who live in Australia. Part of your job as a good scientist is to make the argument why these factors might be important to study.

You can do that by reading and evaluating research that has been done in various disciplines on the same topic.

Research articles and reports must always be carefully evaluated and the results never taken at face value.

What Does a Research Article Look Like?

The only way to gain expertise in understanding the results of research studies is to read and practice understanding what they mean. Begin with one of the journals in your own area. If you don’t know of any, do one of two things:

  • • Visit your adviser or some faculty member in the area in which you are interested and ask the question, “What are the best research journals in my area?”

  • • Visit the library and look through the index of periodicals or search online some of the resources we just identified. You are bound to find hundreds of journals, most online.

For example, for those of you interested in education and psychology and related areas, the following is a sample of 10 research journals that would be a great place for you to start:

  • • American Educational Research Journal

  • • American Psychologist

  • • Educational Researcher

  • • Educational and Psychological Measurement

  • • Harvard Educational Review

  • • Journal of Educational Research

  • • Journal of Educational Psychology

  • • Journal of Educational Measurement

  • • Phi Delta Kappan

  • • Review of Educational Research

Here are 10 more that focus primarily on psychology:

  • • Child Development

  • • Cognition

  • • Human Development

  • • Journal of Applied Developmental Psychology

  • • Journal of Experimental Psychology

  • • Journal of Personality and Social Psychology

  • • Journal of School Psychology

  • • Perceptual and Motor Skills

  • • Psychological Bulletin

  • • Sex Roles

And, don’t forget our previous discussion of Ulrich’s periodical guide (over 300,000 entries).

Criteria for Judging a Research Study

Judging anyone else’s work is never an easy task. A good place to start might be the following checklist, which is organized to help you focus on the most important characteristics of any journal article. These eight areas can give you a good start in better understanding the general format of such a report and how well the author(s) communicated to you what was done, why it was done, how it was done, and what it all means.

Research articles take all kind of shapes and forms, but their primary purpose is to inform and educate the reader.

  • 1. Review of Previous Research

    • • How closely is the literature cited in the study related to previous literature?

    • • Is the review recent?

    • • Are there any seminal or outstanding references you know of that were left out?

  • 2. Problem and Purpose

    • • Can you understand the statement of the problem?

    • • Is the purpose of the study clearly stated?

    • • Does the purpose seem to be tied to the literature that is reviewed?

    • • Is the objective of the study clearly stated?

    • • Is there a conceptual rationale to which the hypotheses are grounded?

    • • Is there a rationale for why the study is an important one to do?

  • 3. Hypothesis

    • • Are the research hypotheses clearly and explicitly stated?

    • • Do the hypotheses state a clear association between variables?

    • • Are the hypotheses grounded in theory or in a review and presentation of relevant literature?

    • • Can the hypotheses be tested?

  • 4. Method

    • • Are both the independent and dependent variables clearly defined?

    • • Are the definitions and descriptions of the variables complete?

    • • Is it clear how the study was conducted?

  • 5. Sample

    • • Was the sample selected in such a way that you think it is representative of the population?

    • • Is it clear where the sample came from and how it was selected?

    • • How similar are the participants in the study to those who have been used in similar studies?

  • 6. Results and Discussion

    • • Does the author relate the results to the review of literature?

    • • Are the results related to the hypothesis? Is the discussion of the results consistent with the actual results?

    • • Does the discussion provide closure to the initial hypothesis presented by the author?

  • 7. References

    • • Is the list of references current?

    • • Are they consistent in their format? Are the references complete?

    • • Does the list of references reflect some of the most important reference sources in the field?

  • 8. General Comments About the Report

    • • Is the report clearly written and understandable?

    • • Is the language biased?

    • • What are the strengths and weaknesses of the research?

    • • What are the primary implications of the research?

    • • What would you do to improve the research?

    • • Does the submitted manuscript conform to the editor’s or publisher’s specifications?

Using Electronic Tools in Your Research Activities

Imagine this if you will: You are in your apartment and it is late at night. You find that you need one more citation on the development of charter schools to complete your literature review. You are tired. It is snowing. The library is about to close, and it might not have what you need anyway.

Zoom, you’re on the Internet and you’re on the way. Log on to your library and access one of their many databases to search for the information you need. In 20 seconds you have the reference to read or print. Is this for real? You bet, and since the printing of the last edition of Exploring Research (some staggering 3 years ago), online tools and databases are even more dominant forces in preparing, conducting, and disseminating research.

Both the computer as a tool and the library as a storehouse of information play different, but equally important and complementary, roles in the research process.

Whether at home, in your office, or in the confines of the library—and now using wireless technology at the mall or in front of the student union—the use of technology for completing literature searches and reviews is booming, and blooming with new databases to search becoming available each day.

In a moment we’ll start our explanation of some of this, but first a few words of “this can’t be true, but it is.” Many of you who are using this book may have never taken advantage of what your library services have to offer. You may not, for whatever reason, access these from off campus, but what is not understandable is why you are not accessing these resources on campus. All colleges and universities (and, of course, the local public library) provide free access to all these resources for students. The personal computers you can use may be located in the computer center, in the library, in academic buildings, or even in all three and more—but they are surely there for the using. It is likely that a hefty chunk of the fees that you pay each semester goes toward purchasing new equipment and paying for these services, so use them!

And just a few more words about libraries in general. We all know how easy it is to explore a library’s contents online—it’s quick, easy, and usually very reliable. But, there is also a huge benefit to actually physically visiting the library other than to take the orientation workshop we mentioned earlier in the book. Here’s the thing: What you may find in the library, incidental to what you are looking for, you may never find online. For example, you’re in the stocks exploring articles on charter schools and reading through journal articles organized by volume. Aren’t you delightfully surprised to find that the article before the one you are looking for seems to contain some very relevant information to the question you are asking? And, you take out a few more volumes, find a nice easy chair, turn off your MP3 player, and find even more—treasures that were unanticipated, but nonetheless, very valuable. Make a visit—you’ll be delightfully surprised.

Searching Online

At the University of Kansas, students can walk into Watson Library (one of the main research libraries), sit down at a computer terminal, access ERIC documents, and search through them in seconds for the references of interest—not bad. They can access a network connection that can lead them to millions of other abstracts and full-length articles from hundreds of databases “leased” by the university each year. And they can, of course, do all this from the comfort of their dorm room, apartment, or home 10 or 1,000 miles away. In fact, if they have any difficulty during their online activity, they can even Ask the Librarian—that’s right, open a new window in the browser and enter a question such as “Does the New York Timesstill have an index?” or “What is the leading journal on business education?” These reference librarians are not known as the original search engine for no good reason. They know lots, but most importantly, they know where to find the answers—the key to a good research foundation.

University, business, and government researchers are turning to online information providers more and more to find the key information they need, whether a specific reference or fact, such as the number of bicycles manufactured by Japan or the number of young adults who live in urban areas.

Your local public library, as well as the university’s library system, has access to the Internet as well as guides to the information available electronically.

The Value of Online Searches

Doing online searches boils down to a savings of time and convenience and in some cases, thoroughness versus a visit to the library. You can do a search using one of the online services in a quarter of the time it takes to do it manually.

Another important advantage of online searches, if your search skills are anywhere near competent, is that you are not likely to miss very much. The information providers provide access to tens of thousands of documents, either in their own databases or in others they can access. Dedicated databases have millions of pieces (such as the APA’s PsycINFO) of information. Most colleges and universities now allow access to their libraries from off campus, and an increasing number allow you access to the complete record of the article (as a PDF), not just an abstract.

Finally, and this may be the most attractive advantage, online searches are the way of the future. There is so much information out there that soon it will be close to impossible to search intelligently without the aid of a computer.

If there is any real downside (as we mentioned earlier), it’s that when you use online services, you don’t get a chance to browse among the thousands of books at the library and since books are organized by area of specialization you will very often find yourself opening books that you didn’t even know existed and finding things that can be very valuable.

The Great Search Engines

Although there is no central listing of Web sites, there are search engines that can help you find what you are interested in. For example, the most popular search engine, by far, is Google (www.google.com), and more about that soon. Fill in the term you are looking for and click Google Search and you are bound to find material you can use. Better yet, combine words such as “résumé nursing” to find people who have entered that phrase on their résumé. Type in “www.yahoo.com,” which takes you to an opening page with hundreds of links to topics in every area imaginable.

For example, let’s say you are interested in finding information on homelessness. As you can see in Figure 3A.7, almost 7,000,000 results came up in less than .3 of a second. Amazing. Figure 3A.7 shows the term entered in the search area of Google and the results of that search. We’ll get to an analysis of a Google screen later in this section.

Search engines are tools that help you sift through the thousands of pages of information available on the Internet and identify the specifics of what you need.

Figure 3A.7 A sample Google search.

Google™ is a registered trademark of Google, Inc.

After the search is completed, the results will show several suggested links which you then can click on to find out the contents of the home pages that were found.

Are all search engines created equally? No. And one of the ways in which they are not created equal is what they are best suited for. Table 3A.4 lists a variety of search engines by what they do. The URL don’t have the ubiquitous http://www as the start of each one since browsers such as Firefox, Internet Explorer, Chrome, and Bing can search and locate with that additional information (and keystrokes on your part).

You can also consult a search engine that, in itself, searches many different search engines. For example, search engines such as SurfWax (www.surfwax.com) and Mamma (www.mamma.com and billed, of course, “The Mother of All Search Engines®), are meta-search engines, or those search engines that return the results of exploring many search engines all at once. Let’s say that your research involves looking at the history of baseball and you need to review various major league teams. In Figure 3A.8 you can see the results of a SurfWax search for information about the Washington Nationals where almost 70,000,000 pages were identified.

Here are some tips about using a search engine:

  • • Enter the narrowest search terms and then broaden your search from there. Entering “intelligence” will find lots of stuff, most of it irrelevant; however, if you enter “intelligence” and “children” and “school,” the results will be much more manageable and closer to what you want. Remember that the fewer the words you enter, the more general the results will be.

  • • If you use more than one word, join them with the conjunction “AND,” such as bilingual AND education, or use quotes, such as “bilingual education.” This is the default for some search engines but not all.

  • • If a help file or function comes along with the search engine, open it and read it. It will have invaluable information that will save you time and effort.

  • • When you become more accustomed to using a search engine, look for the more advanced searching techniques and use them.

  • • Didn’t you get what you wanted? The simplest solution is to check your typing. Simple typos spell disaster.

The original, and still the best, search engine is your reference librarian who never crashes, is always available, tends to be helpful, and is very knowledgeable.

Table 3A.4 Different types of search engines and what they search for

If you need to do a general, all purpose search . . .

Name

URL

Alltheweb

google.com

AltaVista

altavista.com

AOL

aol.com

ASK

ask.com

Cull

cull.com

Gigablast

gigablast.com

Google

google.com

Lycos

lycos.com

MSN

msn.com

Yahoo

Yahoo.com

If you want to search for blogs about a particular topic . . .

Name

URL

Blogpulse

Blogpulse.com

Google blog search

blogsearch.google.com

LJSeek

Ljseek.com

Opinmind

Opinmind.com

Tecnhorati

Technorati.com

If you want to search for books . . .

Name

URL

Your local library!

Easy to find at your school URL

Amazon

Amazon.com

BookFinder

Bookfinder.com

Google Scholar

scholar.google.com

Google Books

Books.google.com

ReadPrint

ReadPrint.com

If you want to search for images . . .

Name

URL

Picsearch

Picsearch.com

PhotoBucket

photobucket.com

New York Public Library Digital Gallery digitalgallery.nypl.org

Classroom Clipart

classroomclipart.com

stock .xchng

www.sxc.hu

Courtesy of Google Inc.

  • • Try a synonym for the term or terms you’re looking for. There’s more than one way to eviscerate a feline (get it?).

And if you want to find out even more about search engines, go to websearch.about.com and Wendy Boswell’s all-informative and useful About.com search engine site.

More About Google

Although Google is the most popular search engine and its share of searches continues, and you may use it regularly, it is still worth exploring what it does and how it does it. It regularly catalogues millions of web pages and returns results in very short order. Since it is so popular, here are some specific tips about using this search engine, including some special features you may not know about.

Not just Google, but every search engine has its own special tips and tricks you can learn (at their Web site) to facilitate your searching activities and increase your success rate.

Figure 3A.8 Searching for information about the Washington Nationals Baseball team.

Google™ is a registered trademark of Google Inc.

Google Search Results

Figure 3A.9 shows a search conducted on the term “grade retention.” There’s more to the search results than meets the eye (not only a listing of other Web sites), and here’s a more detailed analysis on what’s in that window and how it might help you.

  • 1. Across the top of the Google search results is a listing of other “tabs” you can click on to find additional information about the topic (Web, Images, Videos, Maps, News, etc.). For example, if you want to find news about the topic on which you searched, click on News. In this case, you can find related news stories that can further your understanding of this topic.

  • 2. To the right of the Google search area (where you enter the terms for which you want to search) are Advanced search and Preferences options. These basically allow you to refine your searches and are easy to learn on your own but surely not necessary as you are learning to use Google and even when you are a fairly competent Google user. As we said earlier, the more refined the words you identify as search terms, the better your results will be.

  • 3. In this example, there are no sponsored links (really advertisements on which Google makes a ton of money), which are usually located on the right-hand side of the page. These advertisements are located away from the results listing so that you very clearly know they are to be treated separately.

  • 4. Below (and to the right of) the Google search term (in this case “grade retention”) is a tally of the results, showing that 1,460,000 “hits” accumulated in .23 second (fast!). Note that if you repeated the same search, you will get a different outcome (probably just slightly) since things change so fast.

Figure 3A.9 A Google search on the phrase “grade retention” and the results.
  • 5. Right below the results line is the all-important results of the search. Most show the following:

    • a. The title of the page (Grade Retention—The Great Debate). Notice how the words “grade retention” are highlighted since this is one of the original search words.

    • b. Next is a brief abstract of the contents of that page, which should allow you to determine whether it is worth exploring.

    • c. Next is the URL, or the Web address, for this particular page followed by the size of the page, the cache (any stored record of this page), and other pages that are similar to this one. As always, you can click on any underlined link.

Word Order and Repetition

You already know that word order matters (we talked about that earlier), but the repetition of words in the search box matters as well.

For example, you saw in Figure 3A.8 the result of a search on grade retention. However, if we enter the search terms “grade retention retention” (we entered it twice), then the weighting of the search leans more toward retention, less toward grade. Similarly, if we entered the terms “grade grade retention,” the search would be weighted toward the topic of grades. Word repetition is not a science, but it does allow you to prompt Google to provide another set of results on the same topic.

Using the Phonebook

This may be the greatest nondocumented, and not generally known, tip and feature about using Google.

A great deal of what we all do as researchers is to find information and locate people. If you find a particularly interesting research article and want to know more about the topic, there’s just nothing wrong with searching for more information about the author of that article and contacting him and her.

For example, let’s say you want to contact this author. The first place to try is his home institution (the University for Kansas, which you can find at www.ku.edu). This should get you what you want. Let’s say, however, that in spite of your efforts, you have no luck.

Using the Google phonebook feature, you can enter the terms phonebook:salkind ks (notice there is no space after the colon and you have to know the state in which the listing is located), and you’ll get the contact information you need. You can reverse the process as well by entering the phone number and seeing the listing. For your information, rphonebook will search only for residential listing and bphonebook only for business listings.

Using Bibliographic Database Programs

Anyone who does research and writes about that research can tell you that one of the most tedious parts of writing a research manuscript is references, references, references—keeping track of them, entering them, and organizing them is just about the least fun anyone can have.

There are a welcome set of tools that can help you do these three things and more. Bibliographic database programs are tools that help you manage your set of references, and the best ones allow you to do things such as

  • • Enter the data for any one reference using a standard form

  • • Change the format to fit the manuscript requirements, such as the American Psychological Association (APA) or the Modern Language Association (MLA)

  • • Search the database of references using key words

  • • Add notes to any one reference which can also be searched

  • • Generate a final list of references for use in the manuscript

Although one of the most tedious, time-consuming parts of creating a research document is tracking and dealing with bibliographic references, there are now several different software programs that can greatly reduce the necessary time and effort.

You can, of course, do all these by using 3″ × 5″ index cards, but entering the references only once and never having to retype them, track them, and organize them—we could go on and on, but we think you get the picture.

A bunch of such bibliographic database programs are available—some of them free and some of them commercially available. Let’s take a look at EndNote (http://www.endnote.com), a commercially available product. All of these tend to offer the same features—you enter information about the reference, and the tool formats it according to the format you specify. They all accomplish this goal in different ways and also offer different bells and whistles, so you should take advantage of the free download and try them out. Other commercial products that work well are ProCite (www.procite.com) and Biblioscape (http://www.biblioscape.com). Be sure that the program works on your operating system because some only work for a Windows- or a Mac-based operating system.

As you can see in Figure 3A.10, EndNote works by your choice of the type of references (book, journal, web page) and then entering the pertinent information. The information then appears in your “library” (we created on named “term paper”). Once finished creating the library of references, EndNote (or another application) generates the bibliography for you with a few clicks, formatted as you want or even using a custom format.

As you can see, each element of the reference (author, date, etc.) is entered in its own space. You complete a separate form for each reference (be it a journal article, a book chapter, or a presentation at a convention) and you select the entry format.

Figure 3A.10 Using Endnotes, one of many citation creation tools.

Courtesy of Google Inc.

Looking for Articles Online

This clever design from the Google People fits very well the needs of any researcher, from the most basic to the most advanced.

Researchers are in the business of finding information and using that information to lay the groundwork for their research. One might search specific sites such as the Washington Post, U.S. News & World Report, or the American Psychological Association, and one would surely find material about a particular topic. But Google is an excellent tool for finding information across many different sites since it will look not only for topics that may have appeared on a particular site, but also for topics that appear secondarily to that site. For example, a search on the NYT Web site for articles on day care would result in a bunch of productive hits. But, how about a search for articles on this topic that may have appeared originally in the Times but in other locations as well? Of course, this can be done for newspapers, periodicals, magazines, journals—anywhere material might appear. How to do it?

Here are the search terms for a simple search for articles about day care in the New York Times: day care site: www.nytimes.com.

Day care appears in quotes so Google will look for it as a set of terms and not just “day” and then “care.” This search results in 28,100 hits.

Now, if we search for the magic words copyright * The New York Times Company day care, we find 851 hits, which includes all the articles on day care from the Times, as well as all the articles used by other publications from the Times (in which they may have cited the Times).

The * in the search terms acts as a wild card so any year of copyright is searched for, and we could get rid of the site: command since the New York Times Company (which is their copyright line) serves the same purpose. Pretty cool.

Finding Tons of Directories and Lists

This is the last Google tip, but another one that could prove invaluable. Much of our job as researchers is to find information, but also collections of information. The command intitle: can serve us quite well.

For example, the search terms intitle: directory day care would return listings of directories containing information about day care. If we changed the search terms to include a wild card, such as intitle: directory * day care, we then get a much more broadly defined list since it can include elderly day care, adult day care, Miami day care, and so on—and the number of returns is much, much higher than the simple direct search we first showed you.

More About Google Than You Can Imagine

Google has a set of help centers located at http://www.google.com/support/ where you should go if you need support about one of their products such as Gmail, Google Docs, or help on searching the Web.

Advanced Google Search Tools

Sure it’s easy to find the phone number of a researcher who lives in Wyoming, but phonebook is only one of many search operators that Google allows to help you refine what you want to do and you can find about all of them at http://www.googleguide.com/advanced_operators.html.

For example, you can use the search operator “define:” to find the definition of a word. So, entering define:mysticism will give you huge lists of the definition of the term on various locations around the Web. If you typed in “definition of mysticism”, you would get web pages that define mysticism, but not central directory of definition. Another really useful operator is “source”, which provides a search on a particular topic limited to the source you identify. For example, if you want to search for information about iPads, but only which appeared in the New York Times, iPad source:New York Times would provide you with a nice collection of articles that have appeared. You can even search for the latest weather report (weather:losangeles) and yes, what time the movies are showing (movie:title such as movie:Greenberg Lawrence, ks) and it really works!

TEST YOURSELF

It’s really easy—and maybe too easy—to conduct your background research online without regard to that massive building in the idle of campus called the library. Do you think it is adequate to conduct your literature review online? What advantages does this strategy offer? Disadvantages?

Using the Internet: Beyond Searches

Most of you who are reading this text are very savvy when it comes to using the Internet, but there are still some of you who are not. The following material is a refresher for those who can always learn something new and an introduction to those who are unfamiliar with the Internet, how it works, and what it can do for a new researcher.

In the most basic of terms, the Internet is a network of networks. A network is a collection of computers that are connected to one another and can communicate with each other. Imagine all these networks being connected to one another and imagine hundreds of networks and thousands of computers of all different types attached to one another and millions of people using those computers. Now you have some idea how large the Internet is. It is growing geometrically and millions of people connect every day for work, for fun, and of course, to pursue research activities.

Research Activities and the Internet

If you are talking about information in all shapes and sizes, there is not much that you cannot do on the Internet. Here is a brief overview of how the Internet can be used for research purposes:

  • • The Internet is often used for electronic mail or e-mail. You can exchange postal mail with a colleague across the United States or the world, but you can also do the same without ever putting pen to paper. You create a message and send it to your correspondent’s electronic address with documents, images, and more attached. It is fast, easy, and fun. For example, if you would like a reprint of an article you find interesting, you could e-mail the author and ask for a copy and it may very well come back to you electronically. Virtually all faculty, staff, and students at educational institutions have access to e-mail. Also if you want further information about a particular person’s work, you could probably find his or her résumé online.

  • • Thousands of electronic newsgroups, often called Usenet newsgroups, are available on the Internet. These are places where information can be posted and shared among Internet users, with topics that range from space exploration to the authenticity of a Civil War–era land deed. You can “drop in” and contribute to any of these newsgroups. For example, if you are interested in K–12 math curricula, try the k12.ed.math newsgroup. How about pathological behavior? Try the sci.psychology. psychotherapy newsgroup. We will return to them again later for a short demonstration.

  • • And finally, there is the world of social media including Facebook and Twitter and these lend themselves to entirely new ways of being used for research purposes. More about these later in this chapter.

More About E-Mail

Imagine it is 1925 and you are sitting at your desk at college, writing a letter to a friend in England. You stamp the letter, mail it, and three weeks later you receive an answer. You are amazed at how fast the mail is and sit down to answer your friend’s new questions about how much you like college and what you will do after you graduate.

Now imagine it is 2012 and you are writing to a friend in England, only this time you use e-mail. From your home, you compose the message, press the send key, and your friend has it almost instantly. Not only does your friend have it, but you copied it to three other members of the research team, including your primary professor. The reply arrives within 20 minutes and “attached” to the message is well-written response to your message and a new paper on the topic of interest.

E-mail works much like conventional mail. You write a message and send it to an address. The big difference is that there is no paper involved. Rather, the messages you send travel from one computer to another in a matter of minutes or hours, rather than in days or weeks, as fast as your voice travels in a telephone conversation.

How should you use e-mail, which is the really big question here? It’s fun for social and family reasons, but it’s an indispensable part of the research process. Imagine having a question about a particular test you want to use in a research study. e-Mail the test’s author. Imagine not being able to find a critical reference. e-Mail the author of that reference (and you should know how to find that author by now given the tips we discussed throughout other parts of this chapter). Imagine not being able to understand a point your professor made in class about a particular statistical technique. With permission, e-mail your professor. This stuff really works.

One note about e-mail. It works because there are servers to which the mail is sent and then distributed. Sometimes these servers break down and mail can be delayed, for an hour or, in some cases when perhaps they have been infected with a virus, for days. Our advice is to have two e-mail addresses, one that you access from school and one of the other many free ones that are available such as those from Yahoo! (www.yahoo.com), Hotmail (www.hotmail.com), or GMail (from Google). You can always use these as a backup and receive or send mail from there. In many cases, you can even view your other mail account receipts (such as your school mail) within your secondary account.

A huge advantage of Web-based mail is that you can access your mail from any computer in the galaxy. It is always available as long as you have an Internet connection. In addition, as Web-based mailing programs become more sophisticated, they offer features that even fancy commercial mailers such as Outlook might not have, such as being able to (easily) enter a vacation message when you are away from your mail client and want people to automatically be notified. Or, you can send mail through GMail and make it appear as if it is being sent through any other account. Very handy. Many researchers create such new mail accounts for each research or writing project so they can segregate their mail and track it more effectively.

Another note: A host of roadblocks have been introduced along with the millions of e-mails that appear every day in mailboxes around the world in the form of spam, adware, viruses, and other nefarious mechanisms for unscrupulous people to gain access to your privacy. No matter how you do it, take advantage of some of the relatively inexpensive commercial products and install them on your home computer. For the most part, your college or university should be taking care of these concerns at some central location. But for you, it is critical (and almost inexcusable) to have some type of effective and current (and this is really important) way to keep your machine free of viruses and other junk.

And yet another note! In your electronic workings these days, you will see reference to the “cloud.” Cloud-based computing is coming—it’s where data, e-mail, and other information are stored on a remote computer (known as a server), so there is nothing locally available on your desktop. Everything, in other words, is Web based including applications (much like Google Docs is today).

The advantage? Clearly, you can do anything from any connected computer. No more new disk to install when applications change; rather, you would work on a subscription basis and every time a new version of Microsoft Office is released, the changes are right there the next time you open it up. It should be cheaper and more readily available (remember, being connected is everything) and, no more backing up (well, sort of). The cloud system you use stores your data in a safe place.

OK, so what’s the drawback? Although we are told otherwise, oops!—there goes the server and there goes everything you created. While cloud computing enthusiasts speak to the reliability and safety of the system—and it is there—you and I both know that someday it will fail. The lesson? It’s the future, but be sure to use whatever local backup system is available as well.

An Introduction to Usenet (News) Groups

Here’s a topic that interestingly enough many people do not know much about. It’s interesting since newsgroups are such an immense source of information and there are so many from which to select.

Imagine being able to find information on more than 100,000 topics, ranging from stereo systems to jokes (censored and otherwise) to the ethics of law to college football to astronomy. Where would you be able to find a collection of such diverse information that can be easily accessed? You guessed it—the Internet and the various newsgroup sites that ship news each day around the world. The news that fits in one category, such as college football or the ethics of law, forms a newsgroup (also called a group). A newsgroup is simply a collection of information about one topic. Once again, surprisingly, very few students are aware of and use newsgroups.

To help manage the flow of articles, news sites are managed, moderated, administered, and censored by system administrators who work for institutions such as universities and corporations. The newsgroups from which you can select news are those made available by the system administrator and more often than not, the system administrator has to give approval before you are allowed to join and contribute.

What’s in the News?

Newsgroups are named and organized based on a set of rules. The most general of these rules has to do with the name of the group itself. There is a hierarchical structure to a newsgroup name, with the highest level of the hierarchy appearing in the left-most position. For example, the newsgroup name k12.ed.tech means that within k12 (the general name for the kindergarten through twelfth-grade newsgroup), there is a subset named ed (for education) and within that another subset named tech (for technology).

Table 3A.5 is a sample of some newsgroups: what these groups are named, the general area they cover, and examples of what is in each of these groups. Originally, all newsgroups started with the .net suffix. Then, a renaming of newsgroups occurred in 1986 and there were seven main groups; .comp, .news, .sci, .rec, .soc, .talk, and .misc. Humanities (.hum) was added so that the number of primary newsgroups was finalized (for now) at eight. The suffix .alt represents all other newsgroups that do not have a clear place in any other groups (and sometimes jokingly is meant to represent Anarchists, Lunatics, and Terrorists due to the subversive and anything goes nature of .alt newsgroups).

Newsgroups can be small or huge discussions of just about any topic.

Table 3A.5 The Big newsgroups prefixes

Newsgroup

General Area

Examples

Alt

Everything that doesn’t fit anywhere else and certainly lots of stuff out of the ordinary

  • • alt.actors.dustin-hoffman (welcome back to the graduate)

  • • alt.amazon.women (xena, the warrior princess and more)

  • • alt.anything (guess)

Comp

Information about computers, computer science, computer software, and general interest computer topics

  • • comp.ai (danger! will robinson!—all about artificial intelligence)

  • • comp.compression (a discussion of ways to compress or reduce files)

  • • comp.software engineering (so you want to design a new chip?)

Hum

Discussion of issues in the humanities

  • • humanities.classics (more about the classic texts)

  • • humanities.language (discussion about languages and how they fit into the study of the humanities)

  • • humanities.philosophy (all about the great masters and their ideas)

Misc

A catchall of topics and ideas

  • • misc.forsale (kind of like a garage sale online)

  • • misc.books (discussions about books and wri ters)

  • • misc.invest (how and where to invest your hard-earned money)

News

Information about news, newsgroups, and the newsgroup network

  • • news.admin.censorships (all about what should and shouldn’t be on the Net)

  • • news.admin.net-abuse.email (don’t like all that junk e-mail? come here for advice)

  • • news.accounce.conferences (where to go to be seen)

Rec

Information about recreation, hobbies, the performing arts, and fun stuff

  • • rec.sport.swimming (make a splash)

  • • rec.bicycles.racing (what cool stuff to buy for your bike to go faster)

  • • rec.skydiving (take an extra ‘chute)

Sci

Information about science, scientific research and discoveries, engineering, and some social science stuff

  • • sci.astro (astronomy)

  • • sci.cognitive (so that’s what you’re thinking!)

  • • sci.skeptic (ufos do exist!)

Soc

Information about the social sciences

  • • soc.couples (people getting along)

  • • soc.penpals (why people write to one another)

  • • soc.misc (stuff that doesn’t fit anywhere else)

Talk

Discussion of current affairs

  • • talk.atheism (about atheism)

  • • talk.rumor (rumor central)

  • • talk.radio (find out about Air America, Sean Hannity and more)

To see how a newsgroup works, let’s follow an example of someone who is interested in educational technology. Almost every browser, such as Firefox, Chrome, or Internet Explorer, comes with its own reader built in and ready to go, but most browsers also come with a groups function that is even easier to use, as you can see in Figure 3A.11. These tools allow you to read existing news and to post new messages.

The first thing you need to do when you are ready to access a newsgroup is to subscribe to it. Your e-mail program or browser (such as Internet Explorer) can do this, or in some cases you may need a separate news reader. From the list of newsgroups, you can select the ones to which you want to subscribe. Each time you go to the newsgroup, you will get the updated version of those newsgroups, including all the news that has been added to that group since the last time you opened it.

The next step would be to open the k12.ed.tech newsgroup and examine the contents, as shown in Figure 3A.12 (we used Google as our reader). Within newsgroups, you will see a listing of topics open for discussion, each one started by an individual as a source for more information, a place to meet electronically, discuss issues, and so forth.

If someone wants to participate in a certain newsgroup, he or she can add a new topic at this level, or go into an existing newsgroup and make a contribution.

Figure 3A.11 The Opening screen for Google groups where you can search for groups, start one of your own or explore the most popular ones.

Google™ is a registered trademark of Google Inc.

Figure 3A.12 The newsgroup is a wide-open community where everyone is welcome to contribute and learn.

Google™ is a registered trademark of Google Inc.

Using Mailing Lists or ListServs

Another really neat way to use the Internet is a great source of information. You can sign up (subscribe) for a listserv discussion group, which is an automatic depository for information. If you subscribe, you receive everything that the list receives. A listserv is also known as a mailing list.

For example, if you belong to the K–12 educational technology mailing list, then each time someone sends mail to that list, you will receive it as well. There are more listservs than you can imagine, and it will take some exploration to find out which ones best fit your needs.

To subscribe to a mailing list, you need to send a message to the list’s administrator. As soon as you do that, a constant stream of messages will come your way. Be careful—if a list is very active, you can receive hundreds of messages in any one day. If you go even a day without checking your mail, your electronic mailbox is likely to get so full of messages that you won’t be able to read anything! Imagine your real mailbox outside your apartment or home. When it gets stuffed full, it is very difficult to pull out any one piece because the mail is packed so tightly. You would need a bigger box (more storage space), or you need to empty the box before it gets so full. Such is the case with an Internet mailing list: Either get a larger e-mail box (ask for more storage space from the system administrator) or check your mail more than once a day.

At Catalist (http://www.lsoft.com/catalist.html) you can find a guide to the always update list of over 500,000 lists(!), all available to you and me, and you can search by the number of subscribers, the country of origin, and, of course, the topic. Want to spend unending hours at your computer learning about everything from black holes to death rays, this is the place to start.

Using Social Media and Blogs

There’s no end to the imagination of entrepreneurs when it comes to the use of technology to have an impact on our lives, and correspondingly, there is no end to the imagination of researchers to use that technology in their research as well.

What About Facebook and Twitter?

You know about Facebook and Twitter, but could you imagine using these tools in a research setting? There are many others, but we’ll look at just these two. Here’s how to use them.

Facebook is a social networking tool that allows users to form groups, communicate with each other, and even play games. With over 600,000,000 active users, it is an amazing way to get like-minded people together to discuss and participate in research where some common interest is maintained. You would be well suited to begin a Facebook group based on your own research interests and reach out for others who have interests that are similar to your’s.

And, of course, Facebook participants can very well become participants in a study as well. Facebook is a magnificent naturally occurring laboratory to study (mostly) young people’s ideas and actions as they exist in virtual and real-time groups. Researchers from Harvard, Indiana, and Carnegie-Melon are all using online subject samples to collect data and test hypothesis.

Twitter is another social networking tool that allows users to create 140-character messages and then allows those messages to be sent out to anyone who is following the author. Sometimes small potatoes and only 10-20 followers. But sometimes, followers number in the hundreds of thousands.

Among other ideas, you can of course follow a researcher in whose work you are interested by simply signing up to follow him on Twitter (you need a Twitter account to do any of these things). Then, each time he or she creates one of those 140 character messages, it comes to you and the hundreds or thousands of other people who signed up on his or her list.

Another way to use Twitter is to find out what is being written as people are being followed by searching on this huge and vast electronic archives that are available. For example, if you wanted to know what people were saying (or Tweeting) about nursing education, you can use Twitter’s simple search box on the main page and enter the words “nursing education” (using quotes since you would want the search to return for both terms together, not each one separately). Or, if you want to dig even deeper, go to the advanced search form (look for it under Help or at http://search.twitter.com/advanced) as you see in Figure 3A.13. And, it’s simple enough to find people—just click the Find People button on the main page.

Writing the Literature Review

It is now time to take all the information you have collected using all the tools you have learned about in this chapter and somehow organize it so it begins to make sense. This is your review of literature, and now you actually need to write it (horrors!). Here are some writing hints.

First, read other literature reviews. There is no arguing with success. Ask a student who has already been through this course or your adviser for a successful proposal. Look carefully at the format as well as the content of the literature review. Also, look at some of the sources mentioned earlier in this chapter, especially sources that are reviews of the literature, journal articles, and other review papers.

Figure 3A.13 Using Twitter help.

Second, create a unified theme, or a line of thought, throughout the review. Your review of literature is not supposed to be a novel, but most good literature reviews build from a very general argument to a more specific one and set the stage for the purpose of the research. You should bring the reader “into the fold” and create some interest in where you will be going with this research that other people have not gone.

Third, use a system to organize your materials. Most reviews of the literature will be organized chronologically within topics. For example, if you are studying gender differences in anxiety and verbal ability among adults, you would organize all the references by topic area (anxiety and verbal ability), and then within each of these topics, begin your review with the earliest dated reference. In this way you move from the earliest to the latest and provide some historical perspective.

Fourth, work from an outline even if you are an accomplished and skilled writer. It is a good idea to use this tool to help organize the main thought in your proposal before you begin the actual writing process.

Fifth, build bridges between the different areas you review. For example, if you are conducting a cross-cultural study comparing the ways in which East Indian and American parents discipline their children, you might not find a great deal of literature on that specific topic. But there is certainly voluminous literature on child rearing in America and in India and tons of references on discipline. Part of the creative effort in writing a proposal is being able to show where these two come together in an interesting and potentially fruitful way.

Sixth, practice may not always make perfect but it certainly doesn’t hurt. For some reason, most people believe that a person is born with or without a talent for writing. Any successful writer would admit that to be a class-A basketball player or an accomplished violinist, one has to practice. Should it be any different for a writer? Should you have any doubts about this question, ask a serious writer how many hours a day or week he or she practices that craft. More often than not, you will see it is the equivalent of the ballplayer or the musician. In fact, a writer friend of mine gives this advice to people who want to write but don’t have a good idea about the level of involvement it requires: “Just sit down at your typewriter or word processor, and open a vein.” That is how easy it is.

So the last (but really the first) hint is to practice your writing. As you work at it and find out where you need to improve (get feedback from other students and professors), you will indeed see a change for the better.

Summary

There’s a lot to know about this selecting a problem topic and doing the necessary background research and it just begins when you have some familiarity with your field and some experience using both online and offline resources. Finding a topic and a question that works for you (in every sense of the word) is a real challenge and often an obstacle for beginning students and beginning scientists. Take your time, talk to your colleagues and your faculty, and make it into an exploration looking for the gold that represents a topic that will carry you to a new level of intellectual growth.


Chapter 5 Measurement, Reliability, and Validity WHAT YOU’LL LEARN ABOUT IN THIS CHAPTER:
  • • Why measurement is an important part of the research process

  • • What the process of measurement includes

  • • What the different levels of measurement are and how they are applied

  • • What reliability means

  • • The different types of reliability and how they are used

  • • How to increase the reliability of a test

  • • What validity means

  • • The different types of validity and how they are used

  • • How to increase the validity of a test

  • • The relationship between reliability and validity

The Measurement Process

Even without knowing it, you probably spend a good deal of time making judgments about the things that go on around you. In many cases, these judgments are informal (“I really like the way he presented that material”), but at times they are as formal as possible (“Eighty-five percent of her responses are correct”).

In both these examples, a judgment is being made about a particular outcome. That is what the process of measurement is all about, and its importance in the research process cannot be overestimated. All your hard work and efforts at trying to answer this or that interesting question are for naught if what you are interested in cannot be assessed, measured, gauged, appraised, evaluated, classified, ranked, graded, ordered, sorted, arranged, estimated, rated, surveyed, or weighed (get the idea?).

The classic definition of measurement was offered more than 45 years ago by an experimental psychologist, S. S. Stevens (1951), as the “assignment of numerals to objects or events according to rules.” With all due respect to Professor Stevens, this definition can be broadened such that measurement is the assignment of values to outcomes. Numbers (such as 34.89 and $54,980) are values, but so are outcomes, such as hair color (red or black) and social class (low or high). In fact, any variable, by its very definition, can take on more than one value and can be measured. It is these values that you will want to examine as part of the measurement process.

This chapter introduces you to some of the important concepts in the measurement process, including levels of measurement, a classification system to help assess what is measured, and the two primary qualities that any assessment tool must possess: reliability and validity.

Levels of Measurement

The nominal level of measurement used reflects how an outcome is measured.

Stevens (1951) is owed credit, not only for the definition of measurement on which much of the content of this chapter is based, but also for a method of classifying different outcomes into what he called levels of measurement. A nominal level of measurement is the scale that represents a hierarchy of precision on which a variable might be assessed. For example, the variable “height” can be defined in a variety of ways, with each definition corresponding to a particular level of measurement as shown in Table 5.1.

Table 5.1 Different levels of measurement used when measuring the same variable. The advantage (and maximum precision) occurs when you use the highest level possible

Level of Measurement

For example . . .

Quality of Level

Ratio

Rachael is 5 feet 10 inches and Gregory is 5 feet 5 inches

Absolute zero

Interval

Rachael is 5 inches taller than Gregory

An inch is an inch is an inch

Ordinal

Rachael is taller than Gregory

Greater than

Nominal

Rachael is tall and Gregory is short

Different from

One way to measure height is simply to place people in categories such as A and B, without any reference to their actual size in inches, meters, or feet. Here, the level of measurement is called nominal because people are assigned to groups based on the category to which they belong.

A second strategy would be to place people in groups that are labeled along some dimension, such as Tall and Short. People are still placed in groups, but at least there is some distinction beyond a simple categorical label. In other words, the labels Tall and Short have some meaning in the context they are used, whereas Category A and Category B tell us only that the groups are different, but the nature of the difference is not known. In the second strategy, the level of measurement is called ordinal.

A third strategy is one in which Rachael is found to be 5 inches taller than Gregory. Now we know that there is a difference between the two measurements and we also know the precise extent of that difference (5 inches). Here, the level of measurement is called interval.

Finally, the height of an object or a person could even be measured on a scale that can have a true zero. Although there can be problems in the social and behavioral sciences with this ratio level of measurement, it has its advantages, as you shall read later in this chapter. This level of measurement is called ratio.

Keep in mind three things about this whole idea of level of measurement:

  • 1. In any research project, an outcome variable belongs to one of these four levels of measurement. The key, of course, is how the variable is measured.

  • 2. The qualities of one level of measurement (such as nominal) are also characteristic of the next level up. In other words, variables measured at the ordinal level also contain the qualities of variables measured at the nominal level. Likewise, variables measured at the interval level contain the qualities of variables measured at both the nominal and ordinal levels. For example, if you know that Lew is 60 inches tall and Linda is 54 inches tall (interval or possibly ratio level of measurement), then Lew is taller than Linda (ordinal level of measurement) and Lew and Linda differ in height (nominal level of measurement).

  • 3. The more precise (and higher) the level of measurement, the more accurate the measurement process will be and the closer you will come to measuring the true outcome of interest.

What follows is a more detailed discussion of each of these different levels of measurement, with examples and applications. Table 5.2 summarizes these four levels and what you can and cannot say about them.

Table 5.2 Different levels of measurement and some of their qualities

Level

Qualities

Example

What You Can Say

What You Can’t Say

Nominal (categories)

Assignment of labels

  • • Gender (male or female)

  • • Preference (like or dislike)

  • • Voting record (for or against)

Each observation belongs to its own category

An observation represents “more” or “less” than another observation

Ordinal (category and order)

Assignment of values along some underlying dimension

  • • Rank in college

  • • Order of finishing a race

One observation is ranked above or below another

The amount that one variable is more or less than another

Interval (category, order, and spacing of equal intervals)

Equal distances between points

  • • Number of words spelled correctly

  • • Intelligence test scores

  • • Temperature

One score differs from another on some measure that has equally appearing intervals

The amount of difference is an exact representation of differences on the variable being studied

Ratio (category, order, and spacing of equal intervals and a zero point)

Meaningful and nonarbitrary zero

  • • Age

  • • Weight

  • • Time

One value is twice as much as another or no quantity of that variable can exist

Not much!

Nominal

The nominal (from the Latin word nomin [name]) level of measurement describes variables that are categorical in nature and that differ in quality rather than quantity; that is, the variable you are examining characterizes your observations such that they can be placed into one (and only one) category. These categories can be labeled as you see fit. All nominal levels of measurement are solely qualitative.

Nominal level variables are categorical in nature.

For example, hair color (blond, red, or black) and political affiliation (Republican, Democrat, or Independent) are examples of nominal level variables. Even numbers can be used in the measurement of nominal level variables, although the numbers have no intrinsic value. Assigning males as Group 1 and females as Group 2 and giving all offensive linemen on a football team jerseys with the numbers 40–50 are examples of nominal or categorical measurement. There is no intrinsic meaning to the number, but it is a label that identifies the items being measured.

An example of a study using a nominal level variable is one that examined the merits of two school-based programs which attempted to facilitate the integration of children with severe mental disabilities with children without disabilities (Cole et al., 1987). The nominal or categorical variable here is the type of arrangement in which the children participated: the Special Friend or the Peer Tutor program. They could participate in one program or the other but not both. The researchers examined how interaction between children with disabilities and children without disabilities differed as a function of the type of program in which they participated. Differences in social interaction during the program, during free play, and during a tutorial session were examined.

There are several things to remember about the nominal level of measurement. First, the categories are mutually exclusive. One cannot be in more than one category at the same time. You cannot be categorized as both Jewish and Catholic (even if you do celebrate both Hanukkah and Christmas). Second, if numbers are used as values, they are meaningless beyond simple classification. You simply cannot tell if someone in Category Blue is less or more intelligent than someone in Category Red.

Ordinal

The ordinal level of measurement describes variables that can be ordered along some type of continuum. Not only can these values be placed in categories, but they can be ordered as well. For this reason, the ordinal level of measurement often refers to variables as rankings of various outcomes, even if only two categories are involved, such as big and little.

Ordinal level variables reflect rankings.

For example, you already saw that Tall and Short are two possible outcomes when height is measured. These are ordinal because they reflect ranking along the continuum of height. Your rank in your high school graduating class was based (probably) on grade point average (GPA). You can be 1st of 300 or 150th of 300. You will notice that you cannot tell anything about the absolute GPA score from that ranking but only the position relative to others. You could be ranked 1st of 300 and have a GPA of 3.75 or be ranked 150th of 300 and have a GPA of 3.90.

From the variables Tall and Short or 1st and 150th, you cannot tell anything about how tall or how short or how smart a student is because ordinal levels of measurement do not include this information. But you can tell that if Donna is shorter than Joan, and Joan is shorter than Leni, then Donna is also shorter than Leni. So although absolute judgments (such as how much taller Leni is than Donna) cannot be made, relative ones can. You can assign the value “graduate with honors” as well as “honors with distinction” and “highest honors with distinction” to further distinguish among those graduating with honors. This scale is ordinal in nature.

Interval

The interval level of measurement, from the Latin intervalum (meaning spaces between walls), describes variables that have equal intervals between them (just as did the walls built by Roman soldiers). Interval level variables allow us to determine the difference between points along the same type of continuum that we mentioned in the description of ordinal information.

Interval level variables have equidistant points along some underlying continuum.

For example, the difference between 30° and 40° is the same as the difference between 70° and 80°. There is a 10° difference. Similarly, if you get 20 words correct on a spelling test and someone else gets 10 words correct, you can accurately say that you got 10 more words correct than the other person. In other words, a degree is a degree is a degree, and a correct spelling word is a correct spelling word is a correct spelling word.

A review conducted by A. Wigfield and J. Eccles (1989) of test anxiety in elementary and secondary school units illustrates how a construct such as anxiety can be measured by interval level variables. For example, the Test Anxiety Scale for Children (Sarasm, 1959) is a 30-item scale that assesses various aspects of anxiety and yields an overall measure. Items such as

  •  If you are absent from school and miss an assignment, how much do you worry that you will be behind the other students when you come back to school?

provide an accurate measure of the child’s anxiety level in this widely used measure of this fascinating construct.

To contrast interval with ordinal levels of measurement, consider the variable age where the ranking in age is as follows:

We know that Bill is older than Harriet, but not by how much. He could be 2 years older than Harriet, and Harriet could be 20 years older than Joshua. Interval level variables give us that difference, whereas ordinal scales cannot. Put simply, using an interval scale, we can tell the difference between points along a continuum (and the exact difference between the ages of Bill, Harriet, Joshua, Rachael, and Jessica), but with ordinal scales we cannot.

Although an interval level scale is more precise and conveys more information than a nominal or ordinal level scale, you must be cautious about how you interpret the actual values along the scale. Eighty degrees might be 10° more than 70°, and 40° might be the same distance from 30°, but what a difference those 10° can make. The 10° between 80° and 70° might make water a bit cooler, but in the 10° between 40° and 30° water freezes. Similarly, just because you got 10 more words correct than a classmate does not mean you can spell twice as well (2 times 10) because we have no idea about the difficulty of the words or whether those 20 words sample the entire universe of all spelling words. More important, if you get no words correct, does that mean you have no spelling ability? Of course not. It does mean, however, that on this test, you did not do very well.

Ratio

The ratio level of measurement, from the Latin ratio (meaning calculation), describes variables that have equal intervals between them but also have an absolute zero. In its simplest terms, this means they are variables for which one possible value is zero, or the actual absence of the variable or trait.

Ratio level variables have a true zero

For example, a study on techniques to enhance prosocial behavior in the classroom (Solomon et al., 1988) measured prosocial behavior with behavior tallies. The five categories of behavior that were measured over a 5-year period, a long time, were cooperative activities, developmental discipline, activities promoting social understanding, highlighting prosocial values, and helping activities. These researchers spent a great deal of time developing systems that could consistently (or reliably, as we will call it later) measure these types of behaviors. The scales they designed are ratio in nature because they have a true zero point. For example, it is easily conceivable that a child could demonstrate no prosocial behaviors (as defined in the study).

This is indeed an interesting level of measurement. It is by far the most precise. To be able to say that Scott (who is 8 years old) is twice as old as Erin (who is four) is a very accurate, if not the most accurate, way to talk about differences on a specific variable. Imagine being able to say that the response rate using Method A is one-half that using Method B, rather than just saying that the response rate is “faster” (which is ordinal) or is “faster by 10 seconds” (which is interval).

This is the most interesting scale of the four levels discussed for other reasons as well. First, the zero value is not an arbitrary one. For example, you might think that because temperature (in Celsius units) has a zero point, it is ratio in nature. True, it does have a zero point, but that zero is arbitrary. A temperature of 0°C does not represent the absence of molecules bumping off one another creating heat (the nontechnical definition of temperature, and my apologies to Lord Kelvin). But the Kelvin scale of temperature does have a theoretical absolute zero (about −275°C), where there is no molecular activity, and here is a true zero or an absence of whatever is being measured (molecular activity).

Continuous Versus Discrete Variables

There is one more distinction we need to make before we move on to hypotheses and their importance in the research process.

Variables, as you well know by now, can take many different forms and can differ from each other in many ways. One of these ways can be whether they are continuous, or whether they are categorical (or discrete).

continuous variable is one that can assume any value along some underlying continuum. For example, height is a continuous variable in that one can measure height as 64.3 inches or 64.31 inches or 67.000324 inches.

discrete or categorical variable is one with values that can be placed only into categories that have definite boundaries. For example, gender is a discrete variable consisting of the categories of male and female; type of car driven is a discrete variable as well—consisting of such possibilities as Volvo, Chevrolet, or Saturn. As you may have already noticed, discrete variables can take on only values that are mutually exclusive. For example, each participant in your study is either female or male.

What’s important to remember about the continuous–discrete distinction is that it is the “real” occurrence of the variable that determines its type—not the artificial system we might impose. We can say that there are tall and short people, but it is the actual nature of the variable of height, which ranges from 0 (no height) to an infinite height, which counts.

What Is All the Fuss?

Let’s be practical. In a research study, you want to measure the variable of interest as precisely as possible. There is just no advantage in saying that Group A is weaker than Group B when you can say that Group A averaged 75 sit-ups and Group B averaged 100. More information increases the power and general usefulness of your conclusions.

Sometimes you will be limited to the amount of information that is available. For example, what if you wanted to study the relationship between age in adulthood and strength, and all you know is which group an adult belongs to (strong or not strong), not that person’s strength score? Such limitations are one of the constraints of doing research in the real world—you have to make do with what you have. Those limitations also provide one of the creative sides of research: defining your variables in such a way that the definition maximizes the usefulness of the information.

At what level of measurement do we find most variables in the behavioral and social sciences? Probably nominal or ordinal, with most test scores (such as achievement) yielding interval level data. It is highly questionable, however, whether scores from measures such as intelligence and personality tests provide anything more than ordinal levels of measurement. A child with an IQ of 110 is not 10 points smarter than a child with an IQ of 100 but might have only scored 10 points more. Likewise, Chris might prefer the chocolate chips from package A to the chocolate chips from package B twice as often, but he might not necessarily like them twice as much.

Therein lies an important point: How you choose to measure an outcome defines the outcome’s level of measurement. “Twice as often” is a ratio level variable; how much Chris likes package A chips can be attitudinal and ordinal in nature.

Most researchers take some liberty in treating ordinal variables (such as scores on a personality test) as interval level variables, and that is fine as long as they remember that the intervals may not be (and probably are not) equal. Their interpretation of the data must consider that lack of equivalency.

Also, you should keep in mind that Stevens’ typology of measurement levels has not gone unchallenged. In the 50 years that this methodology has been around, various questions have been raised about the utility of this system and how well it actually reflects the real-world variables that researchers have to assess (Vellman & Wilkinson, 1993).

These criticisms focus primarily on the fact that a variable may not conveniently fit into any one of the four classifications but may be valuable nonetheless. For example, although intelligence may not be ratio level in nature (no one has none), it is certainly beyond interval in its real-life applications. In other words, the taxonomy might be too strict to apply to real-world data. As with so many things in the world of research, this four-level taxonomy is a starting point to be worked with but not to be followed as law.

TEST YOURSELF

What is the relationship between the levels of measurement and the amount or precision of information available from some test score or other outcome?

Reliability and Validity: Why They Are Very, Very Important

You can have the sexiest-looking car on the road, but if the tires are out of balance, you can forget good handling and a comfortable ride. The tires, or where “the rubber meets the road,” are crucial.

Respected levels of reliability and validity are the hallmarks of good measurement practices.

In the same way, you can have the most imaginative research question with a well-defined, clearly articulated hypothesis, but if the tools you use to measure the behavior you want to study are faulty, you can forget your plans for success. The reliability (or the consistency) and validity (or the does-what-it-should qualities) of a measurement instrument are essential because the absence of these qualities could explain why you act incorrectly in accepting or rejecting your research hypothesis.

For example, you are studying the effect of a particular training program and you are using a test of questionable reliability and validity. Let’s assume for the moment that the treatment truly works well and could be the reason for making significant differences in the groups you are comparing. Because the instrument you are using to assess skills is not consistently sensitive enough to pick up changes in the behavior you are examining, you can forget seeing any differences in your results, no matter how effective the treatment (and how sound your hypothesis).

With that in mind, remember: Assessment tools must be reliable and valid; otherwise, the research hypothesis you reject may be correct but you will never know it!

Reliability and validity are your first lines of defense against spurious and incorrect conclusions. If the instrument fails, then everything else down the line fails as well. Now we can go on to a more detailed discussion of reliability and validity, what they are, and how they work.

A Conceptual Definition of Reliability

Here we go again with another set of synonyms. How about dependable, consistent, stable, trustworthy, predictable, and faithful? Get the picture? Something that is reliable will perform in the future as it has in the past. Reliability occurs when a test measures the same thing more than once and results in the same outcomes.

You can use any of the synonyms for reliability listed earlier as a starting definition, but it is important to first understand the theory behind reliability. So, let’s begin at the beginning.

Reliability consists of both an observed score and a true score component.

When we talk of reliability, we talk of scores. Performance for any one person on any variable consists of one score composed of three clearly defined components, as shown in Figure 5.1.

Figure 5.1 The components of reliability.

The observed score is the score you actually record or observe. It is the number of correct words on a test, the number of memorized syllables, the time it takes to read four paragraphs of prose, or the speed with which a response is given. It can be the dependent variable in your study or any other variable being measured. Any observed score consists of the two other components: true score and error score (see Figure 5.1).

The true score is a perfect reflection of the true value of that variable, given no other internal or external influences. In other words, for any person there is only one true score on a particular variable. After repeated measurements, there may be several values for a particular measurement (due to error in the measurement process which we will get to in a minute), but there is only one true one. However, one can never ascertain what that true value is. Why? First, because most variables, such as memory, intelligence, and aggression, cannot be directly measured and, second, because the process of measurement is imperfect.

Try as we might, we can never design a test that reflects the true score on any variable or characteristic.

Yet, the measurement process and the theory of reliability always assume a true score is there. For example, on a variable such as intelligence, each person has a true score that accurately (and theoretically) reflects that person’s level of intelligence. Suppose that, by some magic, your true intelligence score is 110. If you are then given a test of intelligence and your observed score comes out to be 113, then the test overestimates your IQ. But because the true score is a theoretical concept, there is no way to know that.

The error score is all of those factors that cause the true score and the observed score to differ. For example, Mike might get 85 of 100 words correct on a spelling test. Does this mean that Mike is an “85% correct speller” on all days on all tests of spelling? Not quite. It means that on this day, for this test, Mike got 85 of 100 words correct. Perhaps tomorrow, on a different set of 100 words, Mike would get 87 or 90 or even 100 correct. Perhaps, if his true spelling ability could be measured, it would be 88. Why are there differences between his true score (88) and his observed score (85)? In a word, error. Whose or what error? You’ll find out about that in a moment.

Perhaps Mike did not study as much as he should have, or perhaps he did not feel well. Perhaps he could not hear the teacher’s reading of each word. Perhaps the directions telling him where he was supposed to write the words on the test form were unclear. Perhaps his pencil broke. Perhaps, perhaps, perhaps . . . . All of these factors are sources of error.

Repeated scores on almost any variable are nearly always different from one another because the trait being assessed changes from moment to moment, and the way in which the trait is assessed also changes (albeit ever so slightly) and is not perfect (which no measurement device is).

What Makes Up Error Scores?

Let’s go beyond the catchall of error scores. You can see in Figure 5.1 that error scores are made up of two elements that help to explain why true and observed scores differ.

Both trait and method errors contribute to the unreliability of tests.

The first component of error scores is called method error, which is the difference between true and observed scores resulting from the testing situation. For example, you are about to take an exam in your introductory psychology class. You have studied well, attended reviews, and feel confident that you know the material. When you sit down to take the test, however, there are matching items (which one in Column A goes with Column B?) and crossword puzzle–like items, and you were expecting multiple choice. In addition, the directions as to how to do the matching are unclear. Instead of reaching your full potential on the test (or achieving as close to your true score as possible), you score lower. The error between the two results from the method error—unclear instructions and so on.

The second component is trait error. Here, the reason for the difference between the true and observed scores is characteristic of the person taking the test. For example, if you forgot your glasses and cannot read the problems, or if you did not study, or if you just do not understand the material, then the source of the difference between the true score (what you really know if nothing else interferes) and the score you get on the test (the observed score) is a result of trait errors.

Table 5.3 lists some examples of major sources of error which can affect test scores from one testing situation to the next. The more influential these various factors are, the less accurate the measurement will be; that is, the more influential these factors, the less likely the obtained score will be as close as possible to the true score, the ultimate goal.

What do the components of error have to do with reliability? Quite simply, the closer a test or measurement instrument can get to the true score, the more reliable that instrument is. How do you get closer? By reducing the error portions of the equation you see illustrated in Figure 5.1. So conceptually, reliability is a ratio as shown in Figure 5.2.

If you look at the structure of the equation, you can see that as the error score gets smaller, the degree of reliability increases and approaches 1. In a perfect world, there would be no error, and the reliability would be 1 because the true score would equal the observed score. Similarly, as error increases, the reliability decreases because more of what you observe is caused by something that cannot be predicted very accurately: the changing contributions of trait and method error.

The question of what the components of an observed score are and which one is amenable to change leads us to our next discussion of how to increase reliability.

Table 5.3 Sources of error in reliability. Error can be part of the method used to assess behavior or the person or trait being assessed

Source of Error

Example

General characteristics of the individual

  • • Level of ability

  • • Test–taking skills

  • • Ability to understand instructions

Lasting characteristics of the individual

  • • Level of ability related to the trait being measured

  • • Test–taking skills specific to the type of items on the test

Temporary individual factors

  • • Health

  • • Fatigue

  • • Motivation (“Yuck, another test”)

  • • Emotional strain

  • • Testing environment

Factors affecting test administration

  • • Conditions of test administration

  • • Interaction between examiner and test taker

  • • Bias in grading

Other factors

  • • Luck (no kidding!)

  • • Superstition

Figure 5.2 The ratio of true score to true score plus error score forms the conceptual basis for reliability. Increasing Reliability

Given all that we have discussed so far, it should be almost crystal clear that reliability is closely related to both true and error scores. Given a fixed true score (which is always the case, right?), reliability decreases as the error component increases. Thus, if you want a reliable instrument, you must decrease error. You cannot affect true score directly, so you must minimize those external sources of error (be sure there are clear and standardized instructions, bring more than one pencil in case one breaks, make sure the room is comfortable) that you can control. Strive to minimize trait sources as well (ask participants to get a good night’s sleep, put off the assessment if someone does not feel well, and on). Some important ways to increase reliability include the following:

  • 1. Increase the number of items or observations. The larger the sample from the universe of behaviors you are investigating, the more likely that the sample will be representative and reliable.

  • 2. Eliminate items that are unclear. An item that is unclear (for whatever reason) is unreliable regardless of knowledge or ability level or individual traits; people may respond to it differently at different times.

  • 3. Standardize the conditions under which the test is taken. If the fourth grade class in Pickney Elementary School has to take its achievement test with snowblowers operating right outside the window or the heat turned up too high, you can certainly expect these conditions to affect performance (compared to Sunset Elementary where it is nice and quiet) and, therefore, reliability.

  • 4. Moderate the degree of difficulty of the tests. Any test that is too difficult or too easy does not reflect an accurate picture of one’s performance.

  • 5. Minimize the effects of external events. If a particularly important event—spring vacation, the signing of a peace treaty, or the retirement of a major faculty member, for example—occurs near the time of testing, postpone any assessment. These events are too likely to take center stage at the expense of true performance.

  • 6. Standardize instructions. Bill in one class and Kelly in another should be reading identical instructions and should take the test under the exact same conditions.

  • 7. Maintain consistent scoring procedures. Anyone who has graded a stack of tests containing essay questions will tell you that grading the first one is much different from grading the last. Strive for consistency in grading, even if it means using a sheet with scores in one column and criteria in the other.

How Reliability Is Measured

You know scientists—they love numbers. It is no surprise, then, that a very useful and easy-to-understand statistical concept called correlation (and the measure of correlation, the correlation coefficient) is used in the measurement of reliability. You will learn more about the correlation coefficient in Chapter 9. Correlations are expressed as a numerical value, represented by a lowercase r. For example, the correlation between test 1 and test 2 would be represented as

r test1 . test2

where the scores on test 1 and test 2 are being correlated with one another.

Reliability is most often reflected in the value of the correlation coefficient.

For now, all you need to know about correlations and reliability is that the more similar the scores in terms of change from one time to another (that is, from one test to another), the higher the correlation and the higher the reliability. Keep in mind that reliability is a concern of the instrument, not of the individual.

For example, as you will soon see, one way to measure the reliability of a test is to give the test to a group of people at one point in time and then give the same test to the same group of people at a second point in time, say 4 months later. You end up with two scores for each person.

Now, several things can happen when you have these two sets of scores. Everyone’s score can go down from time 1 to time 2, or everyone’s score can go up from time 1 to time 2. In both these cases, when the scores tend to change similarly and in the same direction, the correlation tends to be positive and the reliability high.

However, what if the people who score high at time 1 score low at time 2, or the people who score low at time 1 score high at time 2? Then the reliability would not be as high. Instead it might be low or none at all because there is no consistency in performance between time 1 and time 2. In general, when the scores on the first administration remain in the same relative (a really important word here) position on the second (high on test 1 and high on test 2, for example), the reliability of the test will be substantial.

Reliability coefficients (which are roughly the same as correlation coefficients) range in value from +1.00 to −1.00. A value of 1.00 would be perfect reliability, where there is no error whatsoever in the measurement process. A value of 0.00 or less indicates no reliability. The standardized tests used in most research projects, which you will learn about in Chapter 6, usually have reliability coefficients in the 0.80 to 0.90 range—about what you need to be able to say a test is reliable.

Types of Reliability

Reliability is a concept, but it is also a practical measure of how consistent and stable a measurement instrument or a test might be. There are several types of reliability, each one used for a different purpose. A discussion of what these types are and how they are used follows. A comparison and a summary of the information are shown in Table 5.4.

TEST YOURSELF

In the simplest of terms, what is reliability and why is it important?

Test–Retest Reliability

Two synonyms for reliability used earlier in this section were consistency and stability. Test–retest reliability is a measure of how stable a test is over time. Here, the same test is given to the same group of people at two different points in time. In other words, if you administer a test at time 1 and then administer it again at time 2, will the test scores be stable over time? Will Jack’s score at time 1 change or be the same as his score at time 2, relative to the rest of the group?

Test–retest reliability examines consistency over time.

An important factor in the establishment of test–retest reliability is the length of the time period between testings. The answer depends on how you intend to use the results of the test, as well as the purpose of your study. For example, let’s say you are measuring changes in social interaction in young adults during their first year in college. You want to take a measure of social interaction in September and then another in May, and you would like to know whether the test you use has test–retest reliability. To determine this, you would have to test the same students at time 1 (September) and time 2 (May) and then correlate the set of scores. Because you are not interested in change in social interaction over a 2-week period, establishing test–retest reliability over such a short period of time, given your intent, is not useful.

Table 5.4 Different types of reliability used for different purposes. However, no matter what type of assessment device you use, reliability is an essential quality that must be established before you test your hypothesis

Type of Reliability

What It Is

How You Do It

What the Reliability Coefficient Looks Like

Test–retest

A measure of stability

Administer the same test/measure at two different times to the same group of participants

r test1⋅ test2

Parallel-forms

A measure of equivalence

Administer two different forms of the same test to the same group of participants

r form1⋅ form2

Inter-rater

A measure of agreement

Have two raters rate behaviors and then determine the amount of agreement between them

Percentage of agreements

Internal consistency

A measure of how consistently each item measures the same underlying construct

Correlate performance on each item with overall performance across participants

  • • Cronbach’s alpha

  • • Kuder-Richardson

Parallel-Forms Reliability

A second common form of reliability is parallel-forms reliability or equivalence. Here, different forms of the same test are given to the same group of participants. Then the two sets of scores are correlated with each other. The tests are said to be equivalent if the correlation is statistically significant, meaning that it is large enough that the relationship is due to something shared between the two forms, not some chance occurrence.

Parallel-forms reliability examines consistency between forms.

When would you want to use parallel-forms reliability, assuming you have created (or have) two forms of the same test? The most common example is when you need to administer two tests of the same construct within a relatively short time and you want to eliminate the influence of practice effects on participants’ scores.

For example, you are studying short-term memory. You read a list of words to people, and you ask them to recite what they can remember 2 minutes later. You might need to repeat this type of test every day for 7 days, but you certainly could not use the same list of 10 words each day. Otherwise, by the last day, the subjects surely would have a good deal of the list memorized as a result of repetition, and the test would provide little information about short-term memory. Instead, you could design several sets of words which you believe are equivalent to one another. Then, if you can establish that they are parallel forms of the same test, you can use them on any day and expect the results from day 1 to be equivalent to the results from day 2.

Inter-Rater Reliability

Test–retest reliability and parallel-forms reliability are measures of how consistent a test is over time(test–retest) and how consistent it is from form to form (parallel forms). Another type of reliability is inter-rater reliability.

Inter-rater reliability examines consistency across raters.

Inter-rater reliability is a measure of the consistency from rater to rater, rather than from time to time or even from test to test. For example, let’s say you are conducting a study that measures aggression in preschool children. As part of the study, you are training several of your colleagues to collect data accurately. You have developed a rating scale consisting of a list of different behaviors preschool children participate in, numbered 1–5, each representing a different type of behavior, as shown in Table 5.5.

As you can see, the behavior coded number 1 on the list is labeled Talking and is defined as verbal interaction with another child. The behavior coded number 4 on the list, labeled Hitting 1, is defined as physically striking another child without provocation. There is nothing complicated about these definitions, right? They seem to be fairly operational and objective. But who is to say that, even with these definitions, Steven and Andrea (the two raters) will identically categorize the behaviors they observe?

What if Steven sees Jill hit Elizabeth and categorizes it as a behavior 4, but Steven categorizes it as a behavior 5 because Andrea saw Elizabeth hit Jill first? You could be in trouble. Raters need to be able to rate and place events in the same category.

To be sure that all raters are in agreement with one another, inter-rater reliability must be established. This is done by having raters rate behavior and then examine the percentage of agreement between them. Let’s say you have Andrea and Steven rate the behaviors of one child every 10 seconds as you train them on the use of the rating scale. Their pattern of choices could look something like what is shown in Table 5.6. To compute their inter-rater reliability, take the number of agreements and divide it by the number of total periods of time rated (20 in this example). In their pretraining rating, the inter-rater reliability comes out to 15 (the number of agreements) divided by 20 (the number of possible agreements), which is 0.75 (75%). After training, as you can see, the value has increased to 18 ÷ 20 or 0.90 (90%), which is quite respectable.

What elements were included in the training? The head of the project probably examined the problems in misclassification and reviewed the definition of behaviors and discussed examples with the raters. In Table 5.6, you can see how the most frequent problems were disagreements between ratings of behavior 4 and behavior 5, which are types of hitting behaviors. Here is where any differences between raters’ judgments would be clarified.

The consequences of low inter-rater reliability can be serious. If one of your raters misclassified 20% of the occurrences, it means that 20% of your data might be wrong.

Table 5.5 Categorizing behaviors. Categories can then be used to record their frequency objectively, but reliability is as important here as with any other kind of measure

Behavior

Code

Definition

Talking

Verbal interaction with another child

Solitary play

Playing alone and no interaction with other children

Parallel play

Playing alongside other children in the same or different activity

Hitting 1

Physically striking other children without provocation

Hitting 2

Physically striking another child with provocation

Table 5.6 Inter-rater reliability before and after training

Internal Consistency

Although internal consistency is a less commonly established form of reliability, you need to know about it as a beginning researcher. Internal consistency examines how unified the items are in a test or assessment.

For example, if you are administering a personality test that contains 100 different items, you want each of these items to be related to one another as long as the model or theory upon which the test is based considers each of the 100 items to reflect the same basic personality construct.

Internal consistency examines the unidimensional nature of a set of items.

Likewise, if you were to give a test of 100 items broken down into five different subscales consisting of 20 items each, then you would expect that test to have internal consistency for each of the subscales if the 20 items within each subscale relate more to one another than they do to the items within any of the other four subscales. If they do, each of the scales has internal consistency.

Internal consistency is evaluated by correlating performance on each of the items in a test or a scale with total performance on the test or scale and takes the form of a correlation coefficient. The most commonly used statistical tools are Cronbach’s alpha and Kuder–Richardson correlation coefficients.

Establishing Reliability: An Example

One of the best places to look for reliability studies is in the Buros Institute’s Buros Mental Measurements Yearbook (you can find complete information about this book in your library or at http://www.unl.edu/buros/), a compendium of summaries and reviews of tests that are currently available. As part of these reviews, the way in which reliability was established is often described and discussed.

For example, Multidimensional Aptitude Battery II is an objectively scored general aptitude or intelligence test for adults in the form of five verbal and five performance subtest scores. The authors of the test computed several types of reliability, including test–retest correlation coefficients which ranged from .83 to .97 for the verbal scale of the test and .87 to .94 for the performance scale. They also computed other reliability indices that provide some indication of how homogeneous or unidimensional the various tests are (as measures of internal consistency) to assess consistently only one dimension of aptitude or intelligence. Although the results of these reliability studies are not terribly exciting for us (but they certainly were for the authors of the test), they provide crucial information that a potential user needs to know and that the author of any test needs to establish for the test to be useful.

TEST YOURSELF

Reliability is a hallmark of a good test. Why is it important and describe one way by which it is established?

Validity

Earlier in this chapter, we mentioned two essential characteristics of a good test. The first is that it be reliable, which was just discussed. The second is that it be valid—the test does what it is supposed to do.

A Conceptual Definition of Validity

Remember consistency, stability, and predictability (among other synonyms for reliability)? How about truthfulness, accuracy, authenticity, genuineness, and soundness as synonyms for validity? These terms describe what validity is all about: that the test or instrument you are using actually measures what you need to have measured.

When you see the term “validity,” one or more of three things should come to mind about the definition and the use of the term. Keep in mind that the validity of an instrument is often defined within the context of how the test is being used. Here are the three aspects of validity:

  • 1. Validity refers to the results of a test, not to the test itself. So if we have the ABC test of social skills, the results of the test may be valid for measuring social interaction in adolescents. We talk about validity only in light of the outcomes of a test.

  • 2. Just as with reliability (although validity is not as easily quantified), validity is never a question of all or none. The results of a test are not just valid or invalid. This progression occurs in degrees from low validity to high validity.

  • 3. The validity of the results of a test must be interpreted within the context in which the test occurs. If this were not the case, everything could be deemed to be valid just by changing its name. For example, here is item number 1 from a 100-item test:

2 + 2 = ?

Most of you would recognize this question to have validity as a measure of addition skills. If we use the question in an experiment focusing on multiplication skills, however, the item loses its validity immediately.

The way the validity of a test should be examined, then, is whether the test focuses on the results of a study and whether the results are understood within the context of the purpose of the research.

Just as with reliability, there are several types of validity which you will come across in your research activities. And you will, of course, have to consider validity when it comes time to select the instruments you intend to use to measure the dependent variable of your interest.

A summary of different types of validity, what they mean, and how they are established is shown in Table 5.7.

Table 5.7 Types of validity

Type of Validity

What Is It?

How Do You Establish It?

Content

A measure of how well the items represent the entire universe of items

Ask an expert if the items assess what you want them to assess

Criterion

          Concurrent

 

A measure of how well a test estimates a criterion

 

Select a criterion and correlate scores on the test with scores on the criterion in the present

          Predictive

A measure of how well a test predicts a criterion

Select a criterion and correlate scores on the test with scores on the criterion in the future

Construct

A measure of how well a test assesses some underlying construct

Assess the underlying construct on which the test is based and correlate these scores with the test scores

Types of Validity

There are three types of validity, each of which is used to establish the trustworthiness of results from a test or an assessment tool.

Content Validity

The simplest, most straightforward type of validity is content validity. Content validity indicates the extent to which a test represents the universe of items from which it is drawn, and it is especially helpful when evaluating the usefulness of achievement tests or tests that sample a particular area of knowledge.

Expert opinion is often used to establish the content validity of a test.

Why just a sample? Because it is impossible to create all the possible items that could be written. Just think of the magnitude of the task. Imagine writing all the possible multiple-choice items you could on the material covered (not necessarily contained) in an introductory psychology book. There must be 1 million items that conceivably could be written on the domains of personality, perception, or personality alone. You could get tired just thinking about it. That is why you sample from all the possible items that could be written.

But back to the real world. Let’s say you are dealing with eighth-grade history, and the unit deals with the discovery of North America and the travels and travails of several great European explorers. If you were to develop a history test that asks questions about this period and wanted to establish the validity of the questions, you could show it to an expert in early American history and ask, “Do these questions fairly represent the universe or domain of early American history?” You don’t have to use such 25-cent words as universe and domain, but you need to know whether you have covered what you need to cover.

If your questions do the job, then the sample of questions you selected to test an eighth grader’s knowledge of early American history, for example, was done as well. Congratulations. That is content validity.

Criterion Validity

Criterion validity is concerned with either how well a test estimates present performance (called concurrent validity) or how well it predicts (future) performance (called predictive validity). Criterion validity is a measure of the extent to which a test is related to some criterion. An assumption of this method is that the criterion with which the test is being compared has some intrinsic value as a measure of some trait or characteristic. Criterion validity is most often used to evaluate the validity of ability tests (current skills) and aptitude tests (potential skills). And, if you have not already guessed, the important thing about criterion validity is the use of a criterion.

In both types of criterion validity, a criterion is used as a confirmatory measure. For example, let’s say you want to investigate the use of graduate school grades in predicting which people in the clinical psychology program will become especially successful researchers. To that end, you locate a sample of “good” researchers (as defined by the number of journal articles they have published in the past 20 years). Then, you would find out how well those researchers did as graduate students and how well their school performance (or grades) predicted membership in the “good” group. You might also want to locate a group of “not good” researchers (or those who did not publish at all) and compare how well their graduate school grades predicted membership in the “good” or “not good” group. In this case, graduate school grades would have predictive validity (of success as a researcher) if grades (the test) predicted performance as a researcher (the criterion).

This sounds nice and neat and clean, but who is to judge the nature and the value of the criterion? Does the number of articles published constitute good research? What if 90% of one researcher’s articles are published in journals that have a rejection rate of 50%, whereas someone else has published only one article in one journal where the rejection rate is 90%? And what if that one article has a significant and profound effect on the direction of future research in the discipline? As with any other building block in the research process, the criterion that you use to establish validity must be selected with some rationale. In this case, you would have to provide the rationale for assuming that the number of articles published, regardless of their quality, is what is important (if that is what you believe).

Another problem that occurs with both concurrent and predictive validity is the serious concern for what the tests actually measure. One assumes that if the tests correlate with the criterion, then the relationship must be meaningful. So, if the results of your intelligence test correlate with eye color or nose size or the shape of the bumps on your head, does that mean the test has criterion validity? The answer is “Yes,” if you think that eye color and nose size and study of bumps on the head (the study of which is called phrenology, by the way) are good indicators of intelligence. Don’t laugh—the history of science is filled with such well-meaning (and some not so well-meaning), but mistaken, assumptions and conclusions.

Construct Validity

Construct validity is the big one. It is a time-consuming and often difficult type of validity to establish, yet it is also the most desirable. Why? First a definition: Construct validity is the extent to which the results of a test are related to an underlying set of related variables. It links the practical components of a test score to some underlying theory or model of behavior.

Construct validity examines whether test performance reflects an underlying construct or set of related variables.

For example, construct validity allows one to say that a test labeled as an “intelligence test” actually measures intelligence. How is this validity established? Let’s say that, based on a theory of intelligence (which has undergone some scrutiny and testing and stands the test of time), intelligence consists of such behaviors as memory, comprehension, logical thinking, spatial skills, and reasoning; that is, intelligence is a construct represented by a group of related variables. If you develop a set of test items based on the construct and if you can show that the items reflect the contents of the construct, then you are on your way to establishing the construct validity of the test.

Therefore, the first step in the development of a test that has construct validity is establishing the validity (in the most general scientific terms) of the underlying construct on which the test will be based. This step might require many studies and many years of research. Once the evidence for the validity of the construct is there, you then could move on to the design of a test that reflects the construct.

There is a variety of ways in which construct validity can be established.

First, as with criterion validity, you can look for the correlation between the test you are developing and some established test which has already been shown to possess construct validity. This is a bit of a “chicken-and-egg” problem because there is always the question of how construct validity was first established.

Second, you can show that the scores on the newly designed test will differ between groups of people with and without certain traits or characteristics. For example, if you are developing a test for aggression, you might want to compare the results for people known to be aggressive with the results of those who are not.

Third, you can analyze the task requirements of the items and determine whether these requirements are consistent with the theory underlying the development of the test. If your theory of intelligence says that memory is important, then you would expect to have items that tap this ability on your test.

Establishing Validity: An Example

Speaking of intelligence, here is how three researchers (Krohn et al., 1988) went about exploring the construct validity of the Kaufman Assessment Battery for Children (K-ABC).

The issue these researchers attacked is a familiar one: Is a test that is valid for one group of people (white preschoolers) also valid for another group (black preschoolers)? To answer this question, the researchers used perhaps the most common strategy for establishing construct validity: They examined the correlation between the test in question and some other established and valid measure of intelligence, in this case the Stanford–Binet Intelligence Scale, the most widely used intelligence test for young children.

I hope you are asking yourself, “If a widely used, presumably good test of intelligence exists, why go through the trouble to create another?” A very good question. The answer is that the developers of K-ABC (Kaufman & Kaufman, 1983) believe that intelligence should tap cognitive abilities more than previous tests have allowed. K-ABC measures both intelligence and achievement and is based on a theoretical orientation that is tied less to culture than tests such as the Stanford–Binet and the Wechsler Intelligence Scale for Children (WISC).

In one study, Krohn, Lamp, and Phelps (1988) tested the same children using both K-ABC and Stanford–Binet and found that K-ABC had substantial support as a measure of intelligence in the population of black preschool children from which the sample was selected.

Another way in which the construct validity of a test is established is through the use of the multitrait-multimethod matrix—quite a mouthful but quite a technique, and very demanding as well.

This technique measures various traits using various methods. What you would expect to happen is that, regardless of how you measure the trait, the scores are related. Thus, if you measure the same trait using different methods, the scores should be related, and if you measure different traits using the same methods, the scores should not be related.

For example, if we are trying to establish the construct validity of a test of children’s impulsivity using a paper-and-pencil format, we might measure it two ways: by using a pencil-and-paper instrument (the one we’re trying to develop) and by attaching an activity meter to the child’s wrist. At the same time, we’ll also measure another variable, such as movement or activity level. So each trait—impulsivity and activity level—is measured using each method, the paper-and-pencil test as well as the wrist-attached activity level meter. The matrix might look like that shown in Figure 5.3.

Figure 5.3 Using a matrix of more than one method to measure more than one trait allows for the use of the multitrait-multimethod matrix method of testing for construct validity.

If the paper-and-pencil test measure of impulsivity does what it should, then the cells indicating low, medium, and high (for the strength of the relationship) should turn out as shown in Figure 5.3.

For example, the relationship between impulsivity measured using a paper-and-pencil test and that measured using an activity meter should be moderate. Because these methods are so different from one another, any relationship we observe has to be the result of what they share in common in the analysis of the construct (which is impulsivity). This is called convergent validity because the methods converge upon one another.

Similarly, you would expect there to be no relationship between the different methods being used to assess different variables or traits, and that’s what the “lows” are for in Figure 5.3. For example, you would expect that the relationship between measuring impulsivity using paper and pencil and activity level using an activity monitor to be low—they share nothing (not method or trait) in common. This is called discriminant validity because method and trait variance are distinct from one another.

What’s good about the multitrait-multimethod procedure? It really works fine in establishing the validity of a test because it places it in direct contrast to existing tests and ties it to the methods that are to be used in the assessment process.

What’s not good? It requires lots of time, and time means money. But, if this is where you have to go to get the proof, what’s a few thousand more lost dollars when school has cost so much already.

TEST YOURSELF

List at least one advantage and one disadvantage of the multitrait-multimethod technique for establishing construct validity. Then, name one other way to establish construct validity.

The Relationship Between Reliability and Validity

Yes, it’s true! A test can be reliable without being valid. Do you know why?

The relationship between reliability and validity is straightforward and easy to understand: A test can be reliable but not valid, but a test cannot be valid without first being reliable. In other words, reliability is a necessary, but not sufficient, condition of validity.

For example, let’s go back to that 100-item test. Here is the same example we used before:

2 + 2 = ?

Now, we can almost guarantee that this is a reliable item because it is likely to result in a consistent assessment of whether the person taking the test knows simple addition. But what if we named it a spelling test? It is obviously not a test of spelling and would certainly be invalid as such. This lack of validity, however, does not affect the test’s reliability.

This might be an extreme example, but it holds true throughout the assessment of behavior. A test may be reliable and consistently assess some outcome, but unless that outcome addresses the issue being studied, it is not valid. End of argument!

Closing (and Very Important) Thoughts

The measurement process is incredibly important and, like so many of the other things that guide researchers’ work, is not simple. It is an area of endeavor filled with its share of controversies and new ideas. Let me plant one idea in your thinking that illustrates how generative and filled with potential the study of measuring human behavior is.

In an article in the prestigious scientific journal Science, M. Lampl, M. L. Veldhuis, and M. L. Johnson (1992) undertook a study that was implicitly suggested by a friend’s comment on how fast the friend’s young baby was growing (as in your mother’s report to your grandmother, “He shot up overnight!”). Doctors usually check infants’ height and weight every other month in the beginning and then every few months as they get older. These researchers decided to see if babies really do grow in particularly fast spurts, so they measured babies’ growth over an extended period of time. What did they find?

You will be amazed to learn that some infants grew as much as one whole inch in a 24-hour period! What is the big deal? Well, the average length of infants of that age is about 20 inches, and the change represents about a 5% increase. If you are an average male adult (about 5 feet, 10 inches) and you grew 5% in 1 day, you would be about 6 feet, 2 inches, and if you are an average female (about 5 feet, 4 inches), you would be about 5 feet, 7 inches. Now about those new pants you need . . . . That’s the big deal.

The lesson is that there are undoubtedly thousands of things going on in the social and behavioral sciences that we don’t notice either because we don’t measure them appropriately (not intentionally, but because that is the way X or Y has been measured before) or because we might be making the wrong assumptions (such as that an infant’s growth rate increases smoothly with no abrupt changes). Most important, what researchers know about human behavior ultimately depends on how they measure what they are interested in studying. In other words, the measurement technique used and the questions asked go hand and hand and are very closely related, both in substance and in method.

Do you want to cut corners in your research? Don’t—but if you have to, don’t ignore anything about the measurement process.

Now, a last thought.

Many students set out to answer interesting questions about this or that research question without having defined a reliable and valid dependent variable. The message here is that if the test is not reliable or valid and the null hypothesis is rejected (or not accepted), then how does one know that “truly” there is no difference between groups rather than the test just not doing its job? All the months of work and effort that would go into a project might be for naught (that is, you don’t get a true reading of what you are examining) if an unreliable or invalid instrument is used.

The moral of this story is: Use a test with established and acceptable levels of reliability and validity. If you cannot find one, do one of two things. Develop one for your thesis or dissertation (which in itself is a huge undertaking) and do no more than that, or change what you are measuring so you are sure that what you ask can be answered in a fair and unbiased fashion.

TEST YOURSELF

A researcher tests the hypothesis that an intervention targeted at malnourished senior citizens works but uses unreliable tests to assess outcomes. What’s wrong with the conclusion that the intervention worked?

Summary

There are no two ways about it—the measurement process is a critical part of putting together a research project and seeing it to fruition. This part of the research project is especially important because a test without the appropriate levels of reliability or validity is of no use to you or anybody else. Using poorly designed measurement tools leads you down the path of never knowing whether you are on the right track or never really accurately measuring what you want. Use your good sense and look around for instruments that have already been shown to have respectable levels of reliability and validity. It will save you time, trouble, and endless headaches.

Chapter 11 Pre- and True Experimental Research Methods WHAT YOU’LL LEARN ABOUT IN THIS CHAPTER:
  • • The importance and role of experimental designs

  • • The importance of randomization in the experimental method

  • • The role of chance in the experimental method

  • • The principles of experimental design

  • • The concepts of internal and external validity and the role they play in the experimental method

  • • Threats to internal and external validity and how these threats can be controlled

  • • How to control extraneous sources of variability

What scientists do is try to find out why things happen. They go to great lengths trying to establish, for example, what the best way is to facilitate learning, why some adults are more successful than their peers, or where differences in attitudes come from. The methods and models described in this chapter can go a long way toward understanding such phenomena.

One tool that can assist in understanding the search for these differences is the true experimental research method. Unlike any of the other methods discussed thus far, the experimental method tests for the presence of a distinct cause and effect. This means that once this method is used, the judgment can be made that A does cause B to happen or that A does not cause B to happen. Other methods, such as historical and descriptive models, do not offer that luxury. Although they can be used to uncover relationships between variables, there is no way that a causal relationship can be established.

Why? It is by virtue of the experimental method itself, which allows for the control of potential sources of differences (or variance), that the following can be said: One factor is related to another in such a way that changes in that factor are causally related to changes in the other. So, it’s not just a relationship where two variables share something in common (as is the case with a correlational relationship); it’s much more. They share something, but one directly affects the other.

For example, the simplest experimental design would be one in which two groups of subjects are randomly selected from a population and one group (the experimental group) receives a treatment and the other group (the control group) receives no treatment. At the end of the experiment, both groups are tested to see if there is a difference on a specified test score. Assuming (and this is the big assumption) that the two groups were equivalent from the start of the experiment, any observed difference at the end of the experiment must be due to the treatment. That is what experimental design, in one form or another, is all about.

When done correctly, experimental designs can provide a tremendous amount of power and control over understanding the causal relationships between variables. Their use, to a significant extent, is responsible for a good deal of the understanding scientists have about behavior.

TEST YOURSELF

There are many famous discoveries in science, but one of the most important methodological ones is the scientific method where groups are compared to one another. Why has this method become so popular and taken on such importance?

Experimental Designs

There is a variety of types of experimental designs. In this section, you will find a description of the set made famous by Donald Campbell and Julian Stanley in their 1963 monograph “Experimental and Quasi-Experimental Design for Research on Teaching,” which helped revolutionize the way in which research projects are planned and conducted.

Quasi-experimental designs are also known as causal-comparative designs.

Campbell and Stanley identified three general categories of research designs: pre-experimental, true experimental, and quasi-experimental. (Quasi-experimental designs are also referred to as causal-comparative designs.) This chapter will discuss the pre-experimental and true experimental designs; Chapter 12 covers quasi-experimental design.

The most significant difference among these types of experimental designs is the degree to which they impose control on the variables being studied. The pre-experimental method has the least amount of control, the true experimental method has the most, and the quasi-experimental method is somewhere in the middle. The more control a design allows, the easier it is to attribute a cause-and-effect sequence of events.

Another way in which these three designs differ from one another is the degree of randomness that enters into the design. You already know that the word “random” implies an equal and independent chance of being selected, but that definition and concept can be applied beyond the selection of a sample of subjects from a population to the concept’s importance in experimental design.

The point at which random assignment enters the process distinguishes different types of experimental designs from one another.

Actually, different steps need to be taken to ensure the quality of true randomness in the best of all experimental designs.

The first step is one you know most about, the random selection of subjects from a population to form a sample. This is the first procedure you would undertake in an experiment. Now you have a sample.

Second, you want to assign subjects randomly to different groups. You want to make sure, for example, that subjects assigned to group 1 had an equal chance of being assigned to group 2.

Finally (if you followed steps 1 and 2), you have two groups you can assume are equivalent to each other. Now you need to decide which of the two groups will receive the treatment or, if you have five groups, which treatment each group will receive. In the same way that you used a table of random numbers in previous examples, you assign (at random) different treatments to the groups.

By following these steps, you can ensure that:

  • 1. The subjects are randomly selected from a population and randomly assigned to groups.

  • 2. Which group receives which treatment is decided randomly as well.

Table 11.1 summarizes some of the primary differences between pre-experimental, true experimental, and quasi-experimental designs. Even though quasi-experimental designs will be discussed in Chapter 12, it is included here so you can see a comparison of all design types. Notice that many of these differences focus on the process of randomization of selection procedures, subjects, and assignment.

Pre-Experimental Designs

Pre-experimental designs have no random assignment of subjects or individuals.

Pre-experimental designs are not characterized by random selection of participants from a population, nor do they include a control group. Without either of these, the power of the research to uncover the causal nature of the relationship between independent and dependent variables is greatly reduced, if not entirely eliminated. These designs allow little or no control over extraneous variables that might be responsible for outcomes other than what the researcher intended.

Table 11.1 Differences between pre-experimental, true experimental, and quasi-experimental designs

Condition

Pre-Experimental Design

True Experimental Design

Quasi-Experimental Design

Presence of a control group?

In some cases, but usually not

Always

Often

Random selection of subjects from a population?

No

Yes

No

Random assignment of subjects to groups?

No

Yes

No

Random assignment of treatment to groups?

No

Yes

No

Degree of control over extraneous variables?

None

Yes

Some

For example, a parent uses an old folk remedy (wearing garlic around the neck) to ward off the evil spirits associated with a child’s cold. Lo and behold, it works! This is the weakest type of experimental conclusion to reach because there is virtually no comparison to show that the garlic worked better than anything else, or better than nothing at all for that matter. The child, of course, might have recovered on his or her own. There is simply no control over other factors that might cause the observed outcome (such as the cold virus running its course).

In research terms, this type of study is called a one-shot case study design, as shown in the following table. For this design and the rest that follow, we’re showing you events that occur in a sequence such as a group for participants being assigned to a group and then some kind of treatment being administered and thensome posttest is given (in this example).

Step 1

Step 2

Step 3

Participants are assigned to one group

A treatment is administered

A posttest is administered

A group is exposed to some type of treatment and then tested. What shortcomings might you notice about this one-shot case study type of pre-experimental design? First, no attempt at randomization has been made. How might this one-shot case study be used? It would not be very useful for experimental work or for establishing cause-and-effect relationships, but it would be acceptable if you were speculating about factors that occurred at an earlier time and the effect they had on later behavior.

Another pre-experimental design, called the one-group pretest posttest design, is represented by the following:

Step 1

Step 2

Step 3

Step 4

Participants are assigned to one group

A pretest is administered

A treatment is administered

A posttest is administered

For example, a researcher is interested in studying how effective method A is in increasing muscle strength. The researcher follows these steps in the completion of the experiment:
  • 1. Advertises for volunteers for the experiment

  • 2. Administers a pretest to measure each participant’s muscle strength

  • 3. Exposes the participants to the hypothesized strength-increasing treatment

  • 4. Administers the posttest

The important comparisons are between the pretest and posttest scores for each participant. The primary problem with this type of design is that there is no control group. Without any control group, how can the researcher tell that any difference observed between the pretest and posttest scores is a function of the treatment or a function of some other factor? What if 50% of the sample did not get enough sleep the night before the posttest? Or what if they participated in another study that also was designed to increase strength? These factors, rather than the specific treatment, might be responsible for any differences in strength.

TEST YOURSELF

What does the “pre” in “pre-experimental design” represent?

True Experimental Designs

True experimental designs include all the steps in selecting and assigning subjects in a random fashion, plus a control group, thereby lending a stronger argument for a cause-and-effect relationship. One of the reasons these designs are so powerful is that they all have random selection of participants, random assignment of treatments, and random assignment to groups.

For example, let’s look at one of the most popular of these designs, the pretest posttest control group design, which looks like this:

True experimental designs control selection of subjects, assignment to groups, and assignment of treatments.

Step 1

Step 2

Step 3

Step 4

Random assignment of participants to a control group

A pretest is administered

No treatment is administered

A posttest is administered

Random assignment of participants to the experimental (or treatment) group(s)

A pretest is administered

A treatment is administered

A posttest is administered

For this design, the researcher would follow these steps:
  • 1. Randomly assign the subjects to the experimental group or the control group

  • 2. Pretest each group on the dependent variable

  • 3. Apply the treatment to the experimental group (the control group does not receive the treatment)

  • 4. Posttest both the experimental group and the control group on the dependent variable (in another form or format, if necessary)

The assumption here, and you are probably on to this, is that because the subjects are randomly assigned to either the control group or the experimental group, they are equivalent at the beginning of the experiment. Any differences observed at the end of the experiment must be due to the treatment because all other explanations have been taken into account.

Pretest and posttest control group designs are not limited to two groups. For example, let’s say that a researcher wants to examine the effects of different literacy programs on how well adults learn to read. One treatment might involve instruction 5 days per week and another might involve instruction 3 days per week. The third group, the control group, would not receive any instruction.

An example of such an experimental design would look something like this:

Step 1

Step 2

Step 3

Step 4

Random assignment of participants to a control group

A pretest is administered

No treatment is administered

A posttest is administered

Random assignment of participants to experimental or treatment group 1

A pretest is administered

Treatment takes place three days a week

A posttest is administered

Random assignment of participants to experimental or treatment group 2

A pretest is administered

Treatment takes place three days a week

A posttest is Administered

The number of treatment groups (in this example, two) does not really make any difference so long as there is a control group. There is, however, an important difference as to the nature of the control group. In some cases, the control group might receive no treatment whatsoever; in others, the control group might receive a different type of treatment from the others. The difference in the role of a control group is a reflection of the type of question that was originally asked.

If the control group does not receive any treatment, then the obvious question is whether the treatment is effective, compared with no treatment at all. If the treatment group is compared with another group receiving treatment, then the question is: Which of the two is the more effective? Although it is a somewhat fine distinction, it is an important one to remember when you are thinking about how to structure your research.

Another popular true experimental design is the posttest-only control group design, which looks like this:

Step 1

Step 2

Step 3

Random assignment of participants to a control group

No treatment is administered

A posttest is administered

Random assignment of participants to the experimental or treatment group

Treatment takes place five days a week

A posttest is administered

The most apparent characteristic here is that there is no pretest for either the control group or the experimental group. The rationale for this approach is that if participants are randomly selected and assigned to groups, there is no need for a pretest. They are already equivalent anyway, right? The answer is “yes” when you have a sufficiently large sample (at least 30 or so in each group). Another reason to use the posttest-only design instead of the pretest posttest design is that sometimes it is not convenient or may even be impossible to administer a pretest. Under these conditions, you can use the posttest-only design.

There are basically two disadvantages to using a posttest-only design. First, if the randomization procedures were not effective, the groups might not be equivalent at the start. Second, you cannot use the pretest to assign people to other experimental groups, such as high or low on some variable. These disadvantages may be of little consequence, yet they deserve some consideration.

The last true experimental design is kind of the grandmommy and daddy of them all, the Solomon four-group design, as shown here:

The Solomon four-group design is extremely useful, but it is also expensive and time consuming.

Step 1

Step 2

Step 3

Step 4

Random assignment of participants to a control group

A pretest is administered

Treatment is administered

A posttest is administered

Random assignment of participants to experimental or treatment group 1

A pretest is administered

No treatment is administered

A posttest is administered

Random assignment of participants to experimental or treatment group 2

No pretest

Treatment is administered

A posttest is administered

Random assignment of participants to experimental or treatment group 3

No pretest

No treatment is administered

A posttest is administered

There are four groups in this design: one experimental group (which receives the treatment) and three control groups, one of which actually receives the treatment as well.

The most interesting and most useful aspects of this design are the many types of comparisons that can be made to determine what factors might be responsible for certain types of outcomes. You might recognize that the relatively simple pretest posttest control group design compares the experimental group with control group 1. However, let’s say, for example, that you are interested in determining the effects of the treatment, but you also want to know if the very act of taking a pretest also changes the final scores. You would then compare the results from the experimental group with those from control group 2. The only thing that differs between these groups is the inclusion of a pretest. To determine the influence of the pretest on posttest scores, compare control group 1 and control group 3 to derive the information you need. The only difference is that group 1 received the pretest, whereas group 3 did not.

You can make all kinds of other comparisons as well. For example, the effect of the treatment on groups that did not receive the pretest (but did receive the treatment) would result in a comparison of group 3 and group 4. This is the same comparison that occurs in the posttest-only control group design mentioned earlier.

Why doesn’t everyone who conducts true experimental research use this particular type of design? One good reason: time. Although the Solomon four-group experimental design is very effective for separating out factors that are responsible for differences in the dependent variable, it is a time-consuming design to execute. You need to arrange for four groups, randomly select and assign participants to four conditions (three control and one experimental), and perform lots of testing. For many researchers, this kind of design is just not practical.

Internal and External Validity and Experimental Design

The different types of experimental designs previously mentioned in this chapter were outlined in the seminal work by Campbell and Stanley (1963), and if you intend to continue in your studies, you should read this short monograph. It’s essential to understanding how research is, and should be, conducted. These researchers realized that it was not enough just to come up with different designs—a way in which to evaluate these designs was also needed. What outside criteria might one use to judge the usefulness of these different ways of approaching a problem?

What was their decision? They decided to use the criteria of internal and external validity; both measure how well the design does what it should.

Internal validity is the quality of an experimental design such that the results obtained are attributed to the manipulation of the independent variable. In other words, if what you see is a function of what you did, then the experiment has internal validity. For example, if you can show that a treatment works to increase the social skills of withdrawn children and if that treatment is the only apparent cause for the change, then the design (and the experiment) is said to be internally valid. If there are several different explanations for the outcomes of an experiment, the experiment does not have internal validity.

Internal validity is synonymous with control.

External validity is the quality of an experimental design such that the results can be generalized from the original sample to another sample and then, by extension, to the population from which the sample originated. For example, if you can apply the treatment for increasing the social skills of withdrawn children to another group of withdrawn children, then the design (and the experiment) is said to have external validity.

External validity is synonymous with generalizability.

Not all designs and experiments have acceptable levels of internal and external validity for a variety of reasons, which Campbell and Stanley call threats to internal and external validity. Once you understand what these threats are, you will be able to see which experimental designs are preferable and why.

Threats to Internal Validity

The following is a brief explanation of those threats to internal validity that lessen the likelihood that the results of an experiment are caused by the manipulation of the independent variable. Good scientists try to reduce or eliminate these threats.

History

Many experiments take place over an extended period of time (history,) and other events can occur outside of the experiment that might affect its outcome. These events might offer a more potent explanation (other than the original treatment) for the differences observed between groups.

For example, a researcher wants to study the effect of two different diets on the school behavior of hyperactive children. Without the researcher’s knowledge, some of the parents of the children in the experimental group have contacted their child’s teacher, and together they have started an at-home program to reduce troublesome school behaviors. If there was a difference in school behavior for the kids on the diet plan, how would one know that it was not attributable to the teacher–parent collaboration? That outside influence (the teacher–parent activity) is an example of history as a threat to internal validity because the at-home program, not the diet plan, might account for any observed difference.

Maturation

Maturation can be defined as changes over time, often caused by biological or psychological forces. These changes might overshadow those that are the result of a treatment.

For example, a researcher is studying the effects of a year-long training program on increasing the strength of school-age children. At the end of the program, the researcher evaluates the children’s strength and finds that the average strength score has increased over the year’s time. The conclusion? The program worked. Correct? Maybe. However, as attractive as that explanation is, by the very nature of physical development, children’s strength increases with age or maturation.

Abracadabra! It was not the treatment but Mother (or Father) Nature who helped the children walk as they got older. That’s maturation.

Selection

The basis of any experiment is the selection of subjects as participants. Selection is a threat to the internal validity of an experiment when the selection process is not random but instead contains a systematic bias that might make the participating groups different from each other.

For example, a researcher wants to determine how extended after-school child care affects family cohesion. As part of the experiment, the researcher forms an experimental group (those families whose children are in extended care) and a control group (those families whose children are not in extended care). Because the families were not randomly selected or randomly assigned to treatments, there is no way to tell whether they are equivalent to each other. The group of extended-care children might come from families with a positive or negative attitude toward the program before it even begins, thereby biasing the outcomes.

Testing

In many experiments, a pretest is part of the experiment. When the pretest affects performance on later measures (such as a posttest), testing can be a threat to internal validity.

As with many threats to internal validity, a control group controls the threat of testing!

For example, a researcher pretests a group of subjects on their eighth-grade math skills, and then teaches them (the treatment) a new way to solve simple equations. The posttest is administered, and there is an increase in the number of correct answers. Given this information, one does not know whether the increase is due to learning a new way to solve the simple equations or to the learning that might have taken place as the result of the pretest. The experience with the pretest alone might make the participants test-wise, and their performance reflects that, rather than the effectiveness of the treatment.

Instrumentation

When the scoring of an instrument itself is affected, any change in the scores might be caused by the scoring procedure, rather than the effects of the treatment.

For example, a researcher is using an essay test to judge the effectiveness of a writing skills program. There is little doubt that when he grades the 100th examination, a different set of criteria will be used than when he graded the first one. Even if the criteria do not change, simple fatigue is likely to cloud the scorer’s judgment and result in differences due to instrumentation, not the actual effects of the program.

Regression

This is a really fascinating (and often misunderstood) threat. The world of probability is built in such a way that placement on either extreme of a continuum (such as a very high or very low score) will result in scores that regress toward the mean on subsequent testing (using the same test). In other words, when children score very high or very low on some measure, you can expect their scores on subsequent testing to move toward the mean, rather than away from it. This is true only if their original placement (in the extreme) resulted from their score on the test.

If you do not already realize it, regression occurs because of the unreliability of the test and the measurement error that is introduced, which places people more in the extremes than they probably belong. Given the lower probability that someone will end up in the extreme part of a distribution (whether high or low), the odds are greater that on additional testings, they will score in an area more central to the distribution. And for high or low scorers, moving toward the center of the distribution means moving toward the mean, which is what regression is all about.

For example, a teacher of children with severe physical disabilities designs a project to increase their self-care skills and pretests the group using anecdotal information compiled in September before the program begins. In June, she retests them and finds that their skills have increased. A solid argument could be made that the increase was due to regression, not to anything the teacher did; that is, children who were in the extremes to begin with (on the self-care skills test) would move toward the average score (and be less extreme) if nothing happened. The change takes place through regression alone and may have nothing to do with the treatment.

Mortality

One of the real-world issues in research is that subjects are sometimes difficult to find for follow-up studies. They move, refuse to participate any further, or are unavailable for other reasons. When this happens, the researcher must ask whether the composition of the group after participants dropped out is basically the same as the initial composition. Mortality (or attrition) is a threat to the internal validity of an experiment when the drop-outs change the nature of the group itself.

For example, research involving very young infants is fascinating but often can be frustrating. They usually arrive sleeping, or crying, or ready to eat, but rarely ready to play, and many have to be sent home and rescheduled or even dropped from the study. Those who are dropped may indeed be substantively different from those who remain, and thus the final sample of subjects may no longer be equivalent to the initial sample, which raises questions about the effectiveness of the treatment on this different sample.

Threats to External Validity

Just as there are threats to the internal validity of a design, so there are threats to a design’s external validity. Once again, external validity is not concerned with whether the manipulation of the independent variable has any effect on the dependent variable (that is the province of internal validity), but whether the results of an experiment are generalizable to another setting. Threats to external validity, including definitions and examples, are discussed in the following. As with threats to internal validity, good scientists try to reduce the threat to external validity.

Multiple Treatment Interference

A set of subjects might receive an unintended treatment in addition to (hence, multiple treatment interference) the intended treatment, thereby decreasing the generalizability of the results to another setting where the unintended treatment may not be available.

For example, let’s say that a group of nursing home residents is learning how to be more assertive, and the nursing aides pick up on the program and do a little teaching of their own. The results of the experiment would not be easily generalized to nursing home residents in another setting, and thus not generalizable, because the other settings may or may not have aides that are as industrious.

Reactive Arrangements

From 1927 through 1932, at the Cicero, Illinois Western Electric company Hawthorne plant, Elton Mayo, a Harvard business professor, measured the effects of changing certain environmental cues—lighting and working hours—on work production. The problem was that the participants in the study knew about Mayo’s intent. Even when the lighting was worse and the working hours were longer, production increased for the experimental group. Why? Because the workers received special attention from the researchers, which resulted in changes in productivity; lighting and working-hour conditions were found to be secondary in importance. Unless subjects were studied within other settings (which would defeat the intent of the experiment), the external validity would be low, as would the generalizability.

Incidentally, this threat to external validity, called reactive arrangements, is also sometimes called, you guessed it, the Hawthorne effect.

Experimenter Effects

Another threat to external validity involves the researchers themselves. Imagine an experiment designed to reduce the anxiety associated with a visit to the dentist. What if the person conducting the desensitization training unintentionally winced each time the dentist’s drill started. The results of such a training program cannot be generalized to another setting because another setting would require a trainer who would behave in a similar fashion. Otherwise, the nature of the experience is changed.

The Hawthorne effect shows how research must consider what participants know about a research experiment.

In other words, the training program might not be as effective without the trainer’s emotional expressions, and hence the results of the training program might not be generalizable because the person conducting the training is not part of the program. In other words, experimenter effects might be responsible for any changes that are observed.

Pretest Sensitization

You have already seen how pretests can inform people about what is to come and thus affect their subsequent scores, thereby decreasing the internal validity of a study. In a similar fashion, the presence of a pretest can change the nature of the treatment, so that the treatment applied in another setting is less or more effective without the presence of the pretest (pretest sensitization). To make things equivalent and to maximize gener-alizability to other settings, the pretest would have to be part of the treatment, which, by definition, would change the nature of the treatment and the experiment’s purpose.

Increasing Internal and External Validity

First, internal validity. It is no secret how to maximize the internal validity of an experiment: Randomly select participants from a population, randomly assign them to groups, and use a control group. In almost every design in which these characteristics are present, most threats to internal validity will be eliminated.

Let’s take the example of the children with severe physical disabilities and the project that begins in September to increase self-care skills. If a group that does not receive the program (the control group) is included, then the assumption is that both the control group and the experimental group will progress or regress equally, so any difference noted at the end of the year must be due to the self-care program.

Similarly, if the groups are equivalent to begin with (ensured through randomization), changes are the result of the treatment, not the lack of equivalence at the beginning of the experiment.

If you want to compensate for any threats to internal validity, use a control group and randomize, randomize, randomize.

The inclusion of a control group and the use of randomization similarly take care of other threats, including testing, mortality, and maturation. Assuming that groups are equivalent to start with and are exposed to similar circumstances and experiences, the only differences between them would be a function of the treatment, right?

Ensuring external validity is a somewhat different story because it is more closely tied to the behavior of the people conducting the experiment, rather than to the design. For example, the only way to ensure that experimenter effects are not a threat to the external validity of the experiment is to be sure that the researcher who administers the treatment acts in a way that does not interfere with the outcome. In the example of desensitizing anxious dental patients, the trainer must not have any significant problems with the dentist’s office setting.

Whereas most threats to internal validity are taken care of by the experiment’s design, most threats to external validity need to be taken care of by the designer of the experiment.

Internal and External Validity: A Trade-off?

This might be a situation in which you can have your cake and eat it too, as long as you do not make a pig out of yourself! An experiment can be both internally and externally valid but with some degree of caution and balance. For example, internal validity in some ways is synonymous with control. The higher the internal validity, the more confident you can be that what you did (manipulate the independent variable) is responsible for the outcomes you observe. On the other hand, if there is too much control (such as very exacting experimental procedures with a very specifically defined sample of subjects), the results of the experiment might be difficult to generalize (hence lower external validity) to any other setting. This is true because the degree of control might be impossible to replicate, to say nothing of how difficult it might be to find a sample that is similar to the one that was originally used.

An experiment must have both internal validity and external validity, and the two must be balanced.

The solution? Use your judgment. Strive to conduct your experiments in such a way as to ensure a moderate degree of internal validity by controlling extraneous sources of variance through randomization and a control group. The same goes for external validity. Unless you can generalize to other groups, the value of your research (depending on its purpose) may be limited.

TEST YOURSELF

In what type of experimental situation (what topics might you be investigating) internal validity is more important than external validity. How about the opposite? Keep in mind that both are always important, but there can be a slight trade-off.

Controlling Extraneous Variables

All this talk about extraneous variables! Just what are they? Extraneous variables are factors that can decrease the internal validity of a study. They are variables that, if not accounted for in some way, can confound the results. As you have read in this Chapter, results are confounded when you cannot separate the effects that different factors might have on some outcome. For example, a researcher is studying the effects of school breakfasts on student attendance. Parents who are more motivated might get their children to school for the breakfasts, which might make the difference between those who attend and those who do not. The breakfast, per se, might have nothing to do with any group difference. In this case, the treatment (the breakfast) is confounded with parents’ motivation.

Variables of importance cannot be ignored, even if they go directly untested.

Almost everywhere you look in experimental research there are variables that can potentially confound study results. These variables muddy the waters in a scientist’s attempt to understand just what factors cause what outcomes. What is the solution to this problem? There are several. The general question becomes, “Which variables are important enough to worry about and which can be deemed unimportant?” Remember, that for any variable, it can be ignored (when it is really irrelevant), tested (when it is important and should be part of the experiment), or enrolled (when it may be important but for a variety of reasons cannot be tested).

Randomization is a very effective way to control for unwanted variance.

For the variables that are of concern, what can be done to minimize the effect they might have on the outcomes of the experiment?

First, you can choose to ignore any variable that is unrelated to the dependent variable being measured. For example, if attendance is the primary dependent variable and offering school lunch is the primary independent variable, are factors such as gender of the child, gender of the teacher, class size, or parents’ age important? Possibly. The only way you can tell is through a review of the literature and the development of some sound conceptual argument as to why the teacher’s gender is or is not related to the child’s attendance. For the most part, if you cannot make an argument for why a variable is related to the outcome you are studying, then it is probably best ignored.

Second, it is through the use of randomization that the effects of many different potential sources of variance can be controlled. Most important, randomization helps to ensure that the experimental and control groups are equivalent in a variety of different characteristics. In the example used before, randomly assigning children to the breakfast or nonbreakfast groups would ensure that parental motivation would be an equally probable influence for both groups and, therefore, it would not be a very attractive explanation for any observed difference.

Matching

In general, random assignment of subjects to groups is a good way to ensure equivalence between groups. The occasion may arise, however, when a researcher wants to make sure that the two groups are matched on a particular attribute, trait, or characteristic. For example, in the school breakfast program study, if parental influence is a concern and if the researcher does not think that random assignment will take care of the potential problem, matching is a technique that can be used.

Matching of subjects simply means that for every occurrence of an individual with a score of X in the experimental group, the researcher would make sure there is a person in the control group with a similar score. In general, the rule you want to remember is that the variable for which subjects are matched needs to be strongly related to the dependent variable of interest; otherwise, matching does not make much sense. Because this is the general rule, it comes as no surprise that the first step in the matching process is to get a measure of the variable to be matched before group assignment takes place. These scores are then ranked, and the pairs that are close together are selected. One subject from each pair is placed in each group, and the experiment continues.

What researchers are doing when they follow this strategy is stacking the cards in their favor to ensure that some important and potentially strong influences are not having an undue effect on the results of the study. Matching is a simple and effective way of ensuring this.

As you might suspect, there is a downside to matching. Matching can be expensive and time consuming, and you might not be able to find a match for all individuals. Suppose one set of parents is extremely motivated and the next most motivated set of parents is far down on the scale. Can you match those sets? It is doubtful. You would probably have to exclude the extreme scoring parents or find another with a similarly high score to whom those parents can be matched.

There’s another downside as well (thanks to Amanda Blackmore, reviewer extraordinaire, for pointing this out)—when you match, you match on certain variables at the expense of establishing equivalence on others. But if you randomly assign participants to groups, and then match on groups (not variables), you have a better chance of getting equivalent groups.

Use of Homogeneous Groups

One of the best ways to ensure that extraneous variables will not be a factor is to use a homogeneous population, or one whose members are very much alike, from which to select a sample. In this way, most sources of differences (e.g., racial or ethnic backgrounds, education, political attitude) might automatically be controlled for. Once again, it is really important for the groups to be homogeneous only on those factors that might affect their scores on the dependent variable.

Analysis of Covariance

A final technique is a fairly sophisticated device called analysis of covariance (ANCOVA), a statistical tool that equalizes any initial differences that might exist. For example, let’s say you are studying whether a specialized exercise program increases running speed. Because you know that running speed is somewhat related to strength, you want to make sure that the participants in the program are equal in strength. Let’s say you try to match subjects but discover there is too wide a diversity to ensure that matching will equalize the groups. Instead, you use ANCOVA.

ANCOVA, on its simplest level, subtracts the influence of the relationship between the covariate (which in this case is strength) and the dependent variable (which in this case is speed) from the effect of one treatment. In other words, ANCOVA adjusts final speed scores to reflect where people started as far as strength is concerned. It is like playing golf with a handicap of a certain number of strokes—handicapping helps to equalize unequals. ANCOVA is an especially useful technique in quasi-experimental or causal-comparative designs when you cannot easily randomly assign people to groups, but you have information concerning variables that are related to the final outcome and on which people do differ.

Variables can play insignificant or quite major roles in experimental research. Why can’t you control every variable in an experiment, and even if you could, why would that be a poor strategy?

Summary

Do you want to find out if A (almost) causes B? Experimental methods are the peaches, the max, the top of the line. They provide a degree of control that is difficult to approach by using any of the other methods discussed so far in this volume. The milestone work of Campbell and Stanley (1963) identified the various threats to these designs and provided tools to evaluate the internal validity and external validity of various pre-experimental and experimental designs. Through such techniques as matching, the use of homogeneous groups, and some statistical techniques, you can have a good deal of confidence that the difference between groups is the result of the manipulation of the independent variable, rather than some other source of differences. If cause and effect is the order of the day, you came to the right place when you read this chapter.