Read the case State v. Ransom (pp. 411-425). Consider your verdict. Prepare a document that expresses your deliberation that justifies your vote of guilty or not guilty. The goal is not simply a w

You flipped a switch and caused the light to turn on. You turned the key and caused the engine to start. Hatfield shot McCoy through the heart and caused McCoy’s death. The drought caused the failure of the corn crop. Those are perfectly reasonable and legitimate causal claims. But determining causes is not always easy. In fact, causal reasoning is subject to a variety of pitfalls and problems. Consider a case:

Lawyers sometimes hire “jury selection specialists” to help them in seating a jury. These specialists use social science research techniques in an effort to select favorable jurors for their side, or, at the very least, to weed out any unfavorable potential jurors. The practice has been a controversial issue in recent years. Most of the debate has been over whether such methods are fair. However, there has also been debate about whether the methods really work.

There have been strong claims in support of the effectiveness of social science methods in jury selection. For example, Litigation Sciences, a company that provides such services, claims in its promotional literature that: “To date, where our advice and recommendations have been employed, our clients have achieved successful results in over 95 percent of the cases in which we have been involved.”1 Certainly when the defense has used jury selection experts in criminal cases the acquittal rate has been much better than average.

What does that show? It seems to indicate that “scientific jury selection techniques” are effective, that such techniques can result in a favorable verdict. But is that the correct conclusion? It’s true that legal teams using scientific jury selection techniques have a high success rate. But does it follow that the scientific jury selection techniques are the cause of the success?

Before going further, try to think of at least one reason for doubting that scientific jury selection causes successful trial results.

(If you need a hint, consider this: Hiring social scientists to do the research necessary for effective scientific jury selection is quite expensive.)

Okay, why are defense teams that use scientific jury selection techniques successful? Perhaps because the scientific jury selection techniques are so effective that they cause the

Who Is Guilty?

Thinking about causality can be perplexing, but also entertaining. Here’s a favorite law school puzzler. Arthur, Bert, and Carl are all members of the French Foreign Legion, stationed far out in the desert. Both Bert and Carl hate Arthur, and separately they plot his murder. When Arthur is ordered to go alone on a long mission across the hot, dry desert, both men see their opportunity. Shortly before Arthur leaves, Bert puts a deadly poison in his canteen. A few minutes later, Carl, not knowing about the poison, pours all the (poisoned) water out of Arthur’s canteen, and replaces the water with sand. Arthur goes on his journey, and dies of thirst. Now obviously both Bert and Carl are evil men who plotted murder and attempted to murder Arthur. But the question is this: Who was the actual murderer? That is, who caused Arthur’s murder?

side using them to win. But perhaps not. Perhaps the scientific jury selection techniques are ineffectual and other factors are the cause of success. One such factor might be that the hiring of teams of social scientists is costly; defendants who can afford such teams must have large sums of money to spend on their defense teams (which means that they not only hire social scientists but also hire a number of high-powered defense lawyers, private investigators to seek out evidence in their favor, and experts to testify for the defense). In short, it may not be the scientific jury selection techniques that are the cause of success; it may instead be that defendants who can afford those techniques can also afford all the other trial advantages that money can buy. Perhaps the actual cause of success is the advantage of having a team of highly qualified lawyers who are working full time on the case (rather than the public defender who, for most criminal defendants, squeezes a few minutes out of an overloaded schedule).2

Distinguishing Causation from Correlation

So the first problem in determining causes is distinguishing genuine causal factors from the various incidental associations. If two sets of phenomena are strongly correlated (when one occurs the second usually follows), that indicates there may be a causal relation between the two, and further inquiry along those lines would certainly be justified; but that would not be sufficient to prove a causal link. There are several possible reasons for such correlations among events. One, the first may indeed be the cause of the second. Two, the causal relation may go in the opposite direction. A striking example of that sort of confusion is the case of the New Hebrides Islanders. The islanders believed that lice caused good health. They based this belief on the fact that healthy islanders were all infested with lice (lice lived in abundance on the tropical islands and it was all but impossible to avoid infestation), while sick islanders had no lice on them. A natural conclusion was that lice caused good health, and loss of lice caused sickness. But in fact the causal relation was exactly the reverse: Lice prefer a narrow range of body temperatures in their hosts, and when the islanders became sick their body temperatures rose above the louse comfort level and the lice sought other lodgings. Being healthy caused the presence of lice and being sick—with a fever—caused the absence of lice. Loss of lice was the result of sickness, not the cause.3

A third way in which two positively correlated phenomena may be related is that both may be causally linked to some third set of events. Consider again the use of scientific jury selection techniques. Defendants who have used such techniques have enjoyed an unusually high acquittal rate. But it may be that the cause of the acquittals was not the use of scientific jury selection techniques; rather, the cause may have been another factor that caused both the use of scientific jury selection techniques and the acquittals—namely, the financial resources that employed both the scientific jury selectors and the time and undivided efforts of the best defense lawyers. Even if scientific jury selection techniques are totally useless, one would still expect to find a high positive correlation between the defense’s use of scientific jury selection techniques and acquittals—at least, one would expect to find such a correlation so long as the most expensive defense lawyers believe that such jury selection techniques are useful. (This is not to suggest that scientific jury selection techniques are ineffective. But the mere fact that there is a positive correlation between acquittals and defendants’ use of such techniques does not prove a causal relation between the two.)

Suppose that a study discovers that parents who drink bottled water have healthier children. Obviously the people who sell bottled water would be delighted, and probably feature it in their advertising. But would it actually indicate that drinking bottled water causes parents to have healthier children? No, by no means. After all, parents who drink bottled water are probably more affluent, and thus can afford better health care for their children, and are less likely to live in hazardous surroundings and thus their children are less likely to be exposed to environmental hazards. Furthermore, parents who drink bottled water may be more health conscious than most, and thus are more careful about their children’s diets, exercise, vaccinations, and checkups. Those are more likely to be the significant causal factors affecting the health of their children. All of those are confounding factors that make it difficult to determine whether the parents’ drinking of bottled water is actually the cause of better health in children.4 A study of heavy coffee drinkers—those drinking eight or more cups of coffee a day—found that they were more likely have heart attacks. One possibility, of course, is that drinking lots of coffee is a causal factor for heart attacks. But consider for a moment some of the other characteristics that heavy coffee drinkers are likely to have. First, they are more likely to smoke (of course many heavy coffee drinkers do not smoke; but heavy coffee drinkers are more likely to smoke than are those who drink little or no coffee, and thus it might be the greater likelihood of smoking that is the actual causal factor for the increased number of heart attacks). And as a heavy coffee drinker, where do I get my afternoon fix of coffee? At the coffee shop, where there is always a tempting array of pastries; or maybe at the drive-thru window of the donut shop. So perhaps it’s not the coffee, but the fact that heavy coffee drinkers are more likely to be eating fattening pastries. Also, heavy coffee drinkers are probably less likely to be getting a lot of exercise; and maybe they are not getting enough sleep; and perhaps they are more likely to be under considerable stress at work. So if people who drink lots of coffee are more likely to have heart attacks, it may be all those associated/correlated factors that are the real causal culprits; in fact, it could be that if these smoking non-exercising pastry-eaters were not drinking coffee they would be having heart attacks at an even higher rate.

What A Lucky Person

Coincidence can be very persuasive. Here’s an example favored by both statisticians and philosophers. We hold a quarter-flipping contest. The quarters used are perfectly balanced, the contest is not rigged, and there is no way to cheat (a machine flips the quarters): the quarters are just as likely to land heads as tails. Approximately a million people enter the contest, and they are paired off, with the winners at each stage meeting in the next stage of the contest. After about 20 rounds, we finally have a winner. The way the contest was set up, someone had to win. But after winning 20 rounds, the winner would very likely conclude that he or she was a “lucky person,” that he or she had the special property of being lucky. That is, there’s a strong temptation to attribute a specific cause, impose a causal pattern, on events that are merely coincidence.

There is a fourth possibility for why there is positive correlation between two sets of phenomena: It’s possible that the correlation is accidental, that there is no causal relation at all. If I rub my rabbit’s foot just prior to the drawing of the daily lottery number and then my number is drawn, I may conclude that rubbing the lucky rabbit foot caused my good fortune; if next week I repeat the procedure and win again, I may become firmly convinced of the positive causal efficacy of rabbit-foot rubbing. There is a positive correlation between the two. But what I am forgetting, of course, is the number of times that I crossed my fingers, wore my lucky cap, gripped a horseshoe, and kissed a four-leaf clover without success. If I, and thousands of other lottery players, try a multitude of “lucky charms,” then some of them, by chance, will be positively correlated with good luck in the lottery. (And, of course, the positive correlation of the lucky charm with lottery luck may be exaggerated: The times I rubbed the rabbit’s foot and won tend to be quite memorable; the many times I rubbed the rabbit’s foot and lost are easily forgotten.)

A nice example of coincidence masquerading as a special cause is discussed by Michael Shermer.5 Michael Drosnin recently wrote a couple of best-selling books called Bible Code and Bible Code II. The Bible Code game is easy to play. Start with the first few books of the Old Testament. Then have your computer generate a printout of every 7th letter (or 3rd, or 13th, or whatever you wish), filling page after page with your printout. Then start looking through the dozens of pages. Look hard: right to left, left to right, diagonally, up and down, in circles or squares, whatever you like. Lo and behold, you will find all sorts of wonders! When Drosnin did this, he found “predictions” or “prophecies” of all sorts of amazing things: the Oklahoma City bombing, the September 11 attack on the World Trade Center, the collision between Jupiter and comet Shoemaker-Levy 9, the notorious affair between President Bill Clinton and Monica Lewinsky, and the assassination of Israeli prime minister Yitzhak Rabin. When skeptics scoffed that these were not genuine predictions, but merely after-the-fact concoctions, Drosnin offered a challenge: “When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I’ll believe them.” Australian mathematician Brenda McKay took the challenge, and soon generated nine “encrypted predictions” of political assassinations in Moby Dick. The moral of this story is a simple one: If you look hard enough for a pattern, you can almost always find one (whether you are looking for a pattern in the numbers worn by the winning Super Bowl quarterbacks or a pattern generated by choosing the seventh letter from a book); and when you find a patterned correlation between events, that correlation may be the result of coincidence, and not the result of any causal link.

Suppose we do a study at your old high school. We randomly select 20 students who drink beer, and 20 students who do not drink, and we track them for a year. At the end of the year the students who do not drink had a GPA of 3.1, while the beer-drinking students had a GPA of 2.9. And so we conclude: drinking beer causes high school students to make lower grades.

Chain Letter Causation

Trust the Lord and He will light your way. This prayer has been sent to you for good luck. The luck has been brought to you. You are to receive good luck within 4 days of receiving this letter. Send copies of this to people you think need good luck. Do not keep this letter. It must leave you within 96 hours after you receive it.

While in the Philippines, General Walker lost his life 6 days after he received this letter. He failed to circulate the prayer; however, before his death he received $775,000. Take note of the following: C. S. Dias received the chain in 1953. He made 20 copies and sent them; a few days later he won $2 million. Carlice Grant received the chain letter and forgot it, and a few days later he lost his job. He found the letter and sent it, 5 days later he got a better job. Darin Fairchild received the chain and not believing it, threw it away. Nine days later he died. For no reason whatsoever should this chain be broken.

Well, maybe. Certainly that’s possible. But even if our results are accurate (and none of our subjects are lying about their drinking habits), that would not establish that drinking beer causes lower grades. Not even close. We can think of three other possibilities (other than drinking beer causes lower grades) to account for our results. First (and perhaps most likely), there is some other factor associated with both beer drinking and lower grades that is the actual cause: For example, it might well be that students who do not drink beer have more parental supervision and parental involvement, and not only do their parents discourage them from drinking beer, but they also encourage them to get their homework done, make sure they get any needed tutoring, take them to the library, and praise them for good grades. The involvement of parents would be a confounder, a factor that confounds and confuses the relation between the factors studied. So perhaps it is not the beer drinking that causes lower grades, but instead the lack of parental involvement and encouragement.

Or second, maybe the causal relationship runs in reverse. Rather than beer drinking causing lower grades, making lower grades drives students to drink.

Or third—and a distinct possibility in a study this small—maybe the results were mere coincidence. We happened to get one nondrinker who is a brilliant student with a 4.0 average, and one occasional beer drinker who is a lousy student (with a 0.1 average). In a study this small, that would be enough to skew the results.

What can we conclude from our study? Maybe beer drinking causes lower grades; but just as likely, maybe there is a third factor (a confounder) that is the actual cause, or maybe the causal relationship runs in reverse, or maybe the results were simply coincidence. In short, causal claims are important, but it is also important to consider them critically, and to be skeptical of causal claims drawn on insufficient evidence.

The Questionable Cause Fallacy

An unjustified causal claim commits the questionable cause fallacy. (In polite company the fallacy of questionable cause sometimes goes by a classier name: post hoc, ergo propter hoc. Its close friends sometimes call it by its first name: the post hoc fallacy. Post hoc, ergo propter hoc is a Latin phrase meaning “After this, therefore because of this.”)

To see the difficulties in ascertaining genuine causal relations, consider the problem of placebos. You’ve seen the pain-reliever commercials: A woman is struggling to control three energetic children and prepare dinner—and then the dog knocks over a table, breaking a vase and scattering water and flowers all over the living room. The woman, not surprisingly, yells at the kids, kicks the dog, and develops a bit of a headache. The concerned announcer suggests that she try two tablets of this wonderful, new, super-strength, fast-acting miracle pain reliever: new Happy-Head. We see her 10 minutes later, wearing a warm smile and playing tag with the kids and giving the dog a doggy treat. “What happened to your headache?” inquires the good Samaritan announcer. “It’s gone, and I feel great,” she replies, “that Happy-Head really works.”

Well, perhaps it does. But even assuming that this is an accurate story, and the woman does feel much better after taking the medication, it does not follow that it was the medication that caused her to feel better. It may well have been the placebo effect of taking the medication. (A placebo is a pill that looks like medication and passes for medicine, but in fact contains no medication: a “sugar pill.”) Studies have shown that in many instances subjects who are given a placebo will be relieved of their pain, especially if the placebo is given with dramatic flourish, as a wonderful new product. So it will be easy enough for purveyors of a new headache remedy to obtain glowing testimonials for the effectiveness of their product, simply as a result of the placebo effect—even if the product itself is totally useless. (The notorious Head-On headache remedy—with the loud obnoxious commercials—made a lot of money with the placebo effect. The only drugs in Head-On have no proven effect on headaches; but even if they did, it wouldn’t matter, because the Head-On that is sold over the counter is diluted to less than one part per billion of active ingredient. By the time it goes in the package, it is basically just water: water at a very steep price. But apparently rubbing water on your head—if you believe it’s a powerful medication—can be a very effective headache placebo for many people.)

How, then, can we tell the real cause? Is it the medication itself? Or is it the placebo effect? That is, if we wanted to test the effectiveness of a new pain-relief medication, how would we do it? By testing two groups: an experimental group, who get the actual medication, and a control group that is as closely matched as possible with the experimental group and is given placebos (the pills are the same size and color, so that both groups think they are receiving identical medication). Then we test to see whether the group taking the medication has a higher rate of pain relief than does the group taking the placebo. If the group receiving the actual medication (the experimental group) has a higher rate of pain relief than the placebo group (the control group), and the two groups are evenly matched, then it is reasonable to conclude that the medication does contribute to the relief of pain, that it is causally effective in relieving pain.

Returning to the question of the causal efficacy of scientific jury selection techniques in gaining favorable verdicts, we can see another possible complication in sorting out the causes: the placebo effect. Perhaps the use of elaborate jury selection procedures has the side effect of encouraging the lawyers who employ the techniques and daunting the opposing lawyers. For example, if the defense has used such techniques, then the defense lawyers may believe—rightly or wrongly—that the jurors are sympathetic to their side, and thus the defense lawyers will be more confident and probably more effective. The opposite will be true for the opposing lawyers: They are likely to feel—again, rightly or wrongly—that the jurors are regarding them unfavorably, and thus may be less cordial to the jury as well as less effective in their arguments. In short, use of scientific jury selection techniques may operate as a sort of placebo; thus the use of the techniques might cause success even if the techniques in themselves are of no value.6

The Method of Science

The method described above for testing pain-relief medication makes use of a randomized controlled experiment; in medical research, it would be called a randomized clinical trial. We take two cases that are as close to identical as possible, with the exception of one key difference. Then we see if the results in one case are different from the results in the other case. If there is a difference in results, then the factor that was present in the one case but not in the other must be the cause of that difference. Thus if we are testing the causal efficacy of a proposed pain-relief nostrum, we need two groups that are as similar as possible. (Obviously we would not want the control group—which takes the placebo—to be college students and the experimental group—which takes the medication—to be members of a retirement community; in such a case, we would not know whether the cause of any difference was age, differences in health, or the powers of the medication.) Furthermore, it is essential that the only difference in the test situation is that one group gets the medication and the other does not. That’s why the control group receives a placebo: We don’t want one group taking a pill and the other group doing nothing. It is also important that the placebo and the medication are identical in size, weight, color, and taste. And it is essential that the two groups be treated the same by the people administering the medication and the placebo (the placebo should be dosed out with the same care as the actual medication); thus, we may use what is called a “double-blind” procedure, in which those giving the pills do not know whether they are giving placebos or medication (it’s double-blind because there is a double layer of “blindness” as to who is getting placebos—both the test subjects and those who deal directly with those subjects are ignorant of which pills are medication and which are placebos).

It is a difficult task to isolate a single difference between groups that are otherwise essentially similar. For example, some of the earliest studies of the effects of smoking on development of lung cancer were almost useless because the smokers included a higher percentage of city dwellers than did the nonsmokers, and people who live in cities are somewhat more likely to develop cancer than are those who live in rural areas. Only when studies matched smokers and nonsmokers for every relevant characteristic (diet, work environment, age, gender, residence) could it be reliably determined that the significant difference in lung cancer rates between smokers and nonsmokers was caused by smoking.

The case of the causal relation between smoking and cancer offers an opportunity to repeat an oft-repeated but never stale maxim of logical thinking: Be careful to note the exact conclusion. In the above case, the conclusion of the lung cancer–smoking researchers was not that smoking is the cause (or the only cause) of lung cancer. Rather, the conclusion is that smoking is a cause of lung cancer, that if one smokes one is more likely to develop lung cancer than if one does not, that if everyone smokes there will be more cases of lung cancer than if no one smokes. It’s important that the exact conclusion be noted; otherwise, one might think that the causal conclusion could be refuted by an argument like this: “Well, Uncle Joe smoked two packs a day for 60 years, and he died last year at age 79 when a tree fell on him. He certainly never had lung cancer. So don’t tell me that smoking causes lung cancer.” That argument would work if the smoking studies concluded that smoking always causes lung cancer; but that is not the claim. Rather, the claim is that smoking is the cause of some cases of lung cancer: that there are people who develop lung cancer because of their smoking, people who would not have developed lung cancer had they not smoked. So pointing out one individual who smoked but did not develop lung cancer will not refute that argument.

Randomized Studies and Prospective Studies

In 1846, Ignaz Semmelweis was a physician at Vienna General Hospital. He noticed that between 1844 and 1846, in the First Maternity Division (where the mothers were attended by physicians), 10% of the women giving birth died from childbed fever. During the same period, the mortality rate from childbed fever in the Second Maternity Division (in which midwives attended the mothers) was only 2%.

Semmelweis wondered why there was such a large difference. He realized that the physicians attending women in the First Maternity Division were also working in the autopsy room, studying cadavers, just prior to assisting in deliveries. The midwives, of course, had not been in the autopsy room. Semmelweis hypothesized that there might be some source of infection being spread by the hands of the doctors from the cadavers to the mothers. Semmelweis tested his hypothesis by requiring that all physicians entering the maternity ward first wash their hands in a solution of chlorinated lime to remove whatever it was that was causing childbed fever, and then comparing the number of cases of childbed fever now occurring with the previous rate.

In Semmelweis’s experiment, the control group are the women who gave birth earlier while under the care of physicians who had not washed their hands. The experimental group are the women who are now giving birth under the care of physicians who had washed their hands. This was one of the great experiments in medical history, but it had some problems. It might have been better if he had randomly selected a group of doctors to use lime solution, and others to wash with something much weaker, or not wash at all; but if he reasonably thought that washing might save lives, then it might be unethical to use that sort of control group. However, there might have been some important differences between the earlier circumstances when doctors were not washing their hands and the later situation when doctors were. For example, suppose that during the earlier period there had been an epidemic of fever, but it had subsided by the time Semmelweis tried his experiment. Or there might have been a large turnover among the doctors, and some very bad doctors had retired and been replaced by newer, more effective doctors. Or some new delivery technique had been developed that was now widely in use. Or maybe the doctors were simply older and more experienced. We wouldn’t know if the reduction in childbed fever was the result of washing hands or from some other cause. That’s why the best way to run the experiment would be by randomly dividing the doctors into wash and not-wash groups, and then recording the difference in childbed fever rates. (It might be even better if each doctor alternated washing and not washing, and then the results were compared.) That way any differences are controlled for through randomization, and any factors that might have influenced the number of childbed fever cases would be cancelled out, with the single exception of washing and not washing. Suppose that in the past year many women of childbearing age had begun to eat a new food, perhaps a new type of vegetable, that (unknown to anyone) gave limited immunity from childbed fever to those who ate it. In that case, Semmelweis would have gotten a lower rate of childbed fever, but the cause might not have been the washing. If the nonwashing and washing doctors had been divided randomly, then the patients of both the control and the experimental groups would be equally likely to eat the new diet, and any differences in rate of childbed fever could safely be attributed to the new routine of hand washing.

So randomized experimental design is the scientific ideal. But it’s not always possible to use that design. For example, suppose we want to know the effects of smoking two packs of cigarettes a day, starting at age 16, over a period of 10 years. The best method for determining that would be by taking a group of several thousand 16-year-olds, dividing them randomly into two groups, and having the experimental group smoke two packs a day for 10 years and keep the control group from smoking at all. But that would be a morally appalling experiment. After all, several years ago we suspected that cigarette smoking was profoundly harmful to health; and so we certainly can’t run an experiment in which we have people do something that we strongly suspect will cause them great harm. Instead, we might run a prospective (sometimes called an epidemiological) test: We will search out a thousand kids who started smoking two packs a day at age 16, and then find a thousand nonsmoking kids (for our control group) who we can match with the smokers in our experimental group. The problem is, what features should we match for? It won’t be enough just to match a 16-year-old male smoker with a 16-year-old male nonsmoker. Smokers and nonsmokers may differ in many other ways that are very important to health. For example, it may be that the smokers generally don’t get as much exercise, or perhaps they drink more on average, or maybe they don’t get as much sleep, or maybe they tend to eat more junk food, or perhaps they are more likely to work around pesticides or industrial chemicals, or maybe they tend to live in urban areas and are more likely to breathe polluted air. And even if we can find a match on all those characteristics, there might still be some characteristic that we hadn’t considered. It turns out that prolonged exposure to asbestos increases one’s risk for developing cancer. But we didn’t know that until recently, and so we would not have thought to control for asbestos exposure. That’s not to say that prospective studies are useless. If they are carefully conducted, they can reveal very important causal results. But they are difficult to run, and it is very difficult to be sure that we have controlled for all the potential causal factors.

Making Predictions

Semmelweis used a controlled experiment to test his theory. The test turned out as he had hoped, and indicated that washing hands with chlorinated lime reduces the spread of childbed fever. But in this case the test was also designed to prove a larger hypothesis: roughly, the hypothesis that disease is spread by something (germs, though Semmelweis didn’t call it that) that is transmitted from others with the disease. That is, many diseases, including childbed fever, are caused by infection from an external source, rather than by some internal process (such as an excess of “black bile”). Semmelweis used a controlled experiment as part of his scientific work, but important as controlled experiments are, they are not the whole of the scientific process. Exactly how does the “scientific method” work?

A popular but false notion of scientific method is that scientists gather lots of data, collect many observations, and finally devise a hypothesis to account for all their observations. It’s a pleasant image, the scientist working in her lab or looking through her telescope or rambling around the rain forest gathering more and more samples and data and information, then finally squeezing it all together into a true scientific theory. But it’s not really the way the scientific method works. One problem with that popular image is that there is just too much to look for. A scientist might make an enormous number of observations, but they would be a terrible jumble, and it would be all but impossible to winnow out the important and relevant data from the distractions. Instead, in applying the scientific method, scientists devise a hypothesis; then they make a prediction based on that hypothesis; and the success of that prediction confirms the truth of their theory. That’s a bit oversimplified, but it’s at least a rough outline of what scientific method involves.

None of this is to deny the importance of scientific observation and data collection, of course. After all, if Semmelweis hadn’t noticed that the mortality rate was much higher in the First Maternity Division than in the Second Division, he would not have formulated his hypothesis. But in order to test and prove his hypothesis, Semmelweis could not just go on collecting data. Instead, he had to follow the basic three steps of scientific method. First, devise a theory or hypothesis that would account for the phenomena (in the popular model, that is the final step; in the actual scientific method, it is the first step). Second, make a prediction based on that theory (Semmelweis predicted a dramatic drop in childbed fever). Third, test to see if that prediction is accurate.

Consider one of the most famous of scientific theories: Newton’s laws of motion. In 1687, Isaac Newton proposed that everything—balls that are hit in Fenway Park, spacecraft that orbit the Earth, the planets that orbit the Sun—is subject to a set of simple laws: In the absence of any force, the momentum of a body remains constant; if there is a force acting on a body, that body will accelerate by an amount directly proportional to the strength of the force and inversely proportional to the mass of the body; if a body exerts a force on a second body, then the second exerts an equal and opposite force on the first; and finally, any two bodies exert forces on each other that are proportional to the product of their masses divided by the square of the distance between them. This is an exciting and extraordinary theory, building on the work of Copernicus, Kepler, and Galileo, and bringing together both celestial and terrestrial motions under one set of simple and precise laws. But was it true? Some scientists remained skeptical: After all, these gravitational forces Newton was talking about, forces acting over enormous distances and requiring no physical contact, sounded a bit mysterious. But in 1705, Edmond Halley published a remarkable prediction, based on Newton’s theory and on astronomical reports of earlier comets. Halley predicted that a comet would reappear on Christmas, in 1758. Now that is a prediction! It’s not like predicting that Bruce will drink coffee tomorrow morning, or that undergraduates will show up in Florida in the spring, or that Buffalo will have a blizzard in February. Comets were strange and mysterious: they seemed to follow no set pattern, they showed up out of nowhere, and they were often seen as divine warnings of doom and disaster. Halley’s prediction was not only dramatic, it was also remarkably accurate, with the comet appearing right on schedule. Newtonian theory had already established its worth by 1758, but the success of Halley’s prediction was the crowning proof.

When Predictions Go Wrong

But not every prediction is so successful. In the sixteenth century, Copernicus proposed the Copernican theory, in opposition to the reigning Ptolemaic theory. According to the Ptolemaic theory, the Earth is stationary and unmoving, and the Sun, the planets, and the realm of fixed stars (the sphere on which all the stars are fixed) all circle around the stationary Earth. Copernicus proposed that the Sun was the center, and the Earth and the other planets were all in orbit around the Sun, while the Earth revolved daily on its axis. According to Copernicus, it took 1 day for the Earth to turn and 1 year for the Earth to travel around the Sun (rather than, as in the Ptolemaic theory, it took 1 day for the Sun to orbit the Earth). Which theory was right, the Copernican or the Ptolemaic? How would you test them? (Remember, it’s the sixteenth century: there are no telescopes, no space shuttles, no space stations.)

Think about what each theory predicts. According to the Ptolemaic theory, the Earth is stationary, fixed. But the Copernican theory claims that the Earth travels all the way around the Sun, every year. Suppose today is October 1. The Ptolemaic theory says that 6 months from now, on April 1, the Earth will be exactly where it is today; it will not have moved an inch. But the Copernican theory asserts that on April 1 the Earth will be an enormous distance from here, way over on the other side of the Sun. So if the Copernican theory is true, then when we take a sighting on the stars today, and calculate the angles between them, and then take another sighting from our new position on April 1, we should get a different result because of the differences in our locations (just as when you look from a distance at two skyscrapers and measure the angle between them, and then move a few hundred yards and take another measurement, you should get a different measurement). That observed difference is called the stellar parallax. The Copernican theory predicts you will observe the stellar parallax; the Ptolemaic theory predicts you will not.

A very ingenious experiment. So they ran the test, and the result: No stellar parallax. So what do we conclude? The Copernican theory is false. We have to give it up. The prediction failed.

Well, no, not exactly. Some people rejected the Copernican theory; but many who believed the Copernican theory were disappointed, but they did not reject the theory. Instead, they questioned some of the background assumptions that had been made in testing the theory. In particular, they questioned one crucial assumption: that the stars are close enough that moving from one side of the Sun to the other should make a measurable difference in the angles of the stars. Instead of rejecting the Copernican theory, they rejected that assumption, and concluded the stars are so far away from us (compared to the distance the Earth travels around the Sun) that we are unable to detect any difference in angle (like if you moved only an inch and looked again at the skyscrapers).

It turns out, of course, that the Copernicans were right: The stars are much farther away than the people of the sixteenth century had imagined (in fact, at distances so enormous that most of us have difficulty getting any sense of it). The Copernican prediction, incidentally, was finally confirmed: The stellar parallax was observed, but not until centuries later with much better observational instruments.

The moral of the story is this: The scientific method involves making predictions, and the success of those predictions supports the truth of the theory. But the falsification of a theory is not quite as simple and straightforward as it might first appear. If our prediction proves false, it may mean that the theory is just wrong, and should be rejected. Or it may mean that we need to reexamine some of the background assumptions that the theory makes, or question some of the data involved in our prediction.

Suppose you run an experiment: Dip a piece of litmus paper in acid. Your chemistry professor tells you about acids and bases, and your prediction is that the litmus paper will turn red when you dip it in the acid. You grasp the paper, dip it in the liquid, and it turns green, not red. Don’t get your hopes up. You didn’t just make a great scientific breakthrough, and refute the entire theory of acids and bases. You probably dipped it in your coffee. Or perhaps your acid solution got contaminated when you spilled your beer in it. Or maybe you got the wrong paper: your chemistry professor said litmus paper, and you thought she said Christmas paper, so you used bright green Christmas wrapping paper. Before we count your litmus experiment as a refutation of our theories of acids and bases, we will check very carefully for any errors you made in procedures and calculations. And you would have to duplicate your experiment, while we observe closely.

So confirming and refuting hypotheses is a complicated process, involving drawing predictions and testing them; and failure of the predictions does not always entail rejection of the hypothesis. We may instead question the background assumptions (Are the stars really so close?) or the specific data (Was that really litmus paper that you dipped in the acid?) Still, there are some important points to note concerning the legitimate testing of hypotheses. First, there must be a prediction. Merely finding a hypothesis that will account for the data is not enough. After all, there are always plenty of hypotheses that would account for any phenomena. How do we account for the spectacular “northern lights,” the aurora borealis that spreads curtains of shimmering color across northern skies? They are caused by Jill Frost: She is the magical sister of Jack Frost, who paints all the trees in their autumn colors. Jack paints leaves, Jill paints skies. It “explains” the existence of the northern lights; but unfortunately for my Jill Frost hypothesis, it makes no predictions and gives us no guidance for our further investigations into the aurora borealis phenomenon. In contrast, consider the hypothesis that the northern lights are caused by electrically charged particles from the Sun that are trapped by the Earth’s magnetic field. That hypothesis allows us to predict that solar eruptions will be followed by displays of the northern lights, that as the Sun moves toward a period of sun spot maximum there will be more displays of the northern lights, and as the sun spot cycle approaches sun spot minimum there will be fewer.

Faulty “Scientific” Claims

So it is not enough that our hypothesis “account for” the phenomena. That’s the easy part. Consider Erich von Daniken’s “ancient astronauts” hypothesis in Chariots of the Gods. Why did ancient peoples build a set of gigantic lines on the plain of Nazca? They were, according to von Daniken, aircraft runways built according to instructions from ancient astronauts who visited Earth from other stars. (Why did these extraordinary visitors, whose super-advanced technology had mastered the ability to travel the almost unimaginable distances between stars, need the help of ancient peoples to construct a set of runways? That’s not altogether clear, but let that pass.) Well, that would explain it. But so would many other hypotheses: An ancient earthly civilization learned to fly, and built the runways themselves; or an ancient civilization built the lines as astronomical calendars, or perhaps for religious ceremonies, or maybe just for esthetic enjoyment; or Jill Frost built the lines as a break from painting the northern lights. Coming up with a hypothesis to explain the phenomena is not the hard part; the challenge is to develop a hypothesis that is testable (that makes testable predictions) and that passes the test.

Second, it is not enough that the hypothesis involve predictions. The predictions must be of a special sort: They must be predictions that could not be made without the hypothesis, and they must be predictions that are sufficiently detailed and specific that they could be wrong. Suppose you predict, based on my astrological sign or the lines in my palm, that I will suffer disappointments. That prediction will surely be accurate, but the success of that prediction won’t provide any support for the hypothesis that astrological forces shape my destiny. After all, the prediction is hardly surprising: It’s a prediction anyone could have made, and without any use of astrological theory. Also, the predictions must be specific enough that they could possibly be wrong. In the summer of 1993, Washington, D.C., was the scene of a great experiment in the powers of transcendental meditation. More than 5,000 trained meditators from around the world converged on the capital. Their purpose was to meditate together, thus producing a “coherent consciousness field” that would have a calming effect on the entire city. They predicted that the intense meditation would reduce crime in the city by 20%. Unfortunately, the summer of 1993 was the most violent in the history of the district, with the murder rate reaching record highs—records that still remain. One might imagine that this would refute the hypothesis that group transcendental meditation can reduce violence. No, not at all. It turns out, according to the organizers of the meditation “experiment,” that violent crime in the district had been reduced by 18%! But how could that be? The murder rate had hit record highs, hadn’t it? True; but the violent crime rate had been reduced 18% from what it would have been without the transcendental meditation. How did they arrive at that figure? By a rigorous analysis that included considerations of fluctuations in the Earth’s magnetic field. But of course a prediction that there will be a decrease from the unknowable data of what the murder rate “would have been” is the sort of prediction that can not be wrong.8

Another way to make “safe” predictions is by making them vague, or ambiguous, so that lots of things can be said to fit them. The “prophecies” of Nostradamus are a good example. You can also make your predictions safe from falsification by using a scattergun approach: Make lots of predictions, and surely at least one will be a hit (and most people will forget the failures). This is a favorite technique of “psychics” who see into the future. Jeane Dixon was one of the most famous. She successfully predicted that John F. Kennedy would die in office (since he would likely spend 8 years in office, that was not too amazing, but still, not bad). She also predicted (shortly before his unpredicted death) that Elvis Presley would give several benefit performances, she predicted that Princess Diana would have a daughter, that during the 1970s the two-party system would vanish, Richard M. Nixon has “good vibrations” and will serve the country well, Vice President Agnew will rise in stature (he was convicted of accepting bribes and resigned in disgrace), the pollution problem will be solved in 1971, and Fidel Castro will be removed from office in 1970. But she did get some right: She predicted that there would be earthquakes “in the eastern part of the world.”

How to Be a Successful Psychic

Tamara Rand is a “psychic to the stars” who provides (for a generous fee) psychic advice to Hollywood stars. She offered a videotape (shown on the Today show, Good Morning America, and CNN) of a local television show from January 6, 1981. On the videotape of a local television show, Tamara Rand predicted that President Reagan would be shot in the chest by a sandy-haired young man with the initials J. H. The young man would be from a wealthy family, and the assassination attempt would occur during the last week of March or the first week of April. Sure enough, on March 29, 1981, John Hinckley—the sandy-haired son of a wealthy family—shot Reagan in the chest. That’s indeed an amazing and specific prediction. However, it turned out that the supposed local television show had never actually aired, and the tape was made by the show’s host with Tamara Rand on March 31, two days after the assassination attempt.9

As noted before, if a prediction fails, it may yet be possible to salvage the hypothesis: perhaps a background assumption was wrong, or the experiment wasn’t carried out correctly, or the observations were in error. But there is a limit to this “ad hoc rescue” of favored hypotheses. Consider the famous “Shroud of Turin” (the shroud kept in the cathedral of Turin, which shows an image of a man and which some claimed to have been the burial shroud of Jesus). In the late 1980s, small samples of the shroud were, with the permission of the Roman Catholic Church, tested by radiocarbon dating. The test was quite elaborate: samples were given to independent labs in Oxford, Zurich, and Tucson. In addition to the samples from the shroud, each lab was given control substances for which the age was already known: threads from an 800-year-old garment, linen from a 900-year-old tomb, and linen from a second-century mummy of Cleopatra. None of the samples were identified for any of the labs. All the control samples were correctly dated by all three labs, and the three labs also agreed in dating the material from the shroud at approximately 1350. Does that disprove the claim that the Shroud of Turin was the burial shroud of Jesus? By no means, suggests one defender of the shroud: The reason the carbon-dating gave the origin of the shroud as fourteenth century is because a burst of neutrons from the body of Christ as He arose from the dead created additional carbon-14 nuclei, making the cloth appear centuries younger than it actually is. But that is merely the ad hoc rescue of a cherished hypothesis: there is no reason whatsoever to believe that such a burst occurred (it was not mentioned in Scripture), except to save the theory.10 In short, when a prediction does not work out, it is possible that some of the background assumptions or procedures were flawed; but then the burden of proof rests on those who claim such errors, and ad hoc rescues cannot carry that burden.

Occam’s Razor.

Legitimate scientific claims must be subject to test, and in particular they must be subject to tests that expose them to the possibility of falsification. There is a second standard for scientific reasoning, and it has a catchy name: the principle of parsimony, which is better known as Occam’s Razor. The principle was formulated by William of Occam, a fourteenth-century philosopher and theologian. Occam (sometimes spelled “Ockham”) formulated the principle as “What can be done with fewer assumptions is done in vain with more.” It is sometimes stated thus: “Entities are not to be multiplied without necessity.” Or in other words, be parsimonious, be thrifty, in formulating scientific explanations. That may sound strange, but it’s an idea you will likely recognize as one of your own.

“’Tis a blessing to be simple,” says the opening line of the famous folk song, and scientists agree. If we favored more complicated explanations over simpler ones, then we could “explain” anything; but the explanations would be rather weird, and they wouldn’t offer much help. My watch has stopped. “Must need a new battery,” you say. “No,” I reply, “it’s not the battery; the problem is that my little watch elf is taking a nap, and so he’s stopped moving the gears.” “There’s no watch elf! It’s just an old battery.” “No, I’ll show you.” I bang the watch briskly on a table, and the hands start to move again. “You see, I awakened the elf, and he started moving the gears again.” When the watch soon stops, it’s because the elf returned to slumber. You open the watch, and find no elf. “You see, there’s no elf there.” “He’s there alright,” I reply. “But he’s very shy, and he hides. Besides, he can make himself invisible.” You place a new battery in the watch, and now it runs fine, without stopping. “You see, it’s the battery that drives the watch; there’s no elf.” “There is too an elf,” I insist; “but he’s a battery-powered elf, and he was getting really run down. That’s why he was sleeping so much. Now that he has a new battery, he’s back on the job driving the gears and running the watch.”

This is a silly dialogue, of course. But it’s silly only because of our commitment to the principle of parsimony, our commitment to Occam’s Razor. When giving explanations, don’t make assumptions that are not required. Or another way of stating it: When there are competing explanations, both of which give adequate accounts of what is to be explained, the simpler or more parsimonious explanation is better. I can “explain” the workings of my watch quite effectively by means of my elf hypothesis: when the watch stops, the elf is asleep; when the watch slows down, the elf is tired; when the watch keeps accurate time, the elf is on top of his game; when the watch is inaccurate, the elf has been drinking. The problem is that this explanation is not as parsimonious as the competing explanation: it adds to the explanatory story a very special additional entity, an elf (and not just any elf: an elf that can make itself invisible). If you let me add elves, ghosts, and miracles to my explanatory scheme, then I can “explain” anything; but the explanations violate the principle of Occam’s Razor, and are not as efficient and effective as the simpler explanations in terms of rundown batteries and rust.

Think back to Von Daniken’s “explanation” of the lines on the plain of Nazca. The hypothesis that they were constructed as landing strips for ancient astronauts would certainly account for them; but as already noted, that hypothesis is not open to falsifiability. As it turns out, it also violates Occam’s Razor: it is hardly the simplest hypothesis that would account for the phenomena. After all, many ancient peoples constructed elaborate astronomical devices (Stonehenge was apparently one), since they spent many nights observing the skies, and keeping track of seasons would be valuable to them (berries ripen in different areas at different times, game migrates at specific times of the year, crops must be planted within a relatively narrow opportunity range in order to avoid the last frost and be harvested before the first). Such an explanation not only involves predictions (some arrangement of the lines should mark the vernal equinox, for example), but is also simpler: it appeals to practices we already recognize, and does not require the positing of very special extraterrestrials who could somehow swiftly travel billions of miles between stars and yet had not learned to land without long runways, and who disappeared with little trace and have never returned.

It often happens that the extra entities that violate Occam’s Razor also make theories unfalsifiable. The watch elf, for example, is not only a superfluous complication, but also a complication that makes the watch elf theory unfalsifiable: no matter what my watch does, an invisible watch elf is always a ready explanation. Ancient astronauts are blessed with similar powers.

Confirmation Bias

There is one more problem with the elf hypothesis. The “predictions” it makes are so general and vague that the hypothesis is subject to “confirmation bias.” This is a problem that plagues attempts to determine causation, and that provides false support for dubious scientific hypotheses. Suppose I have devised a new treatment for the common cold: light-intensive therapeutic energy, or LITE. My treatment procedure consists in giving patients very brief sessions under a sun lamp, at regular intervals, for 2 days. Following the sun lamp sessions, I check on my patients: sure enough, many are showing significant improvement. LITE therapy is a success!

Well, maybe; but that certainly is not proved by my tests. First, as we noted earlier, this “test” is useless because there is no control group for comparison. It may be that patients would start to feel better after a couple of days with or without LITE therapy (their cold symptoms eventually go away with or without treatment). Or it may be that the patients benefit from a placebo effect: They expect that this wonderful new treatment will cure them, so they start to feel better. Or maybe it’s not LITE therapy that helps, but just the fact that these patients spend more time lying down and resting during “LITE therapy,” and it is the additional rest that causes improvement. Or maybe patients who come in for special treatment for their colds also drink more orange juice and get more exercise, and so they would recover more swiftly anyway. Without a control group, this “test” is useless. But there’s another problem if I am running the tests and reporting the results. I have great confidence in and high hopes for my new LITE therapy. So I fully expect to see significant improvements in my LITE therapy patients; and sure enough, I see what I expected. After all, what will count as “improvement”? The standards are vague: fewer sniffles, less coughing, maybe fewer aches and pains, or just a little more energy. Any of those might count as improvements, and if I expect to see such improvements, then it is likely that I’ll find what I’m looking for. Not because I’m being deceitful, but because “confirmation bias” is influencing what I notice (and what I fail to notice: this patient’s sniffles and coughs have increased, but she seems to have a bit more energy; looks like therapeutic improvement to me). Confirmation bias comes into play in lots of situations: the vague predictions of your horoscope seem accurate, because you are inclined to look for ways they are confirmed and ignore the ways they are wrong.

Eliminating confirmation bias is not easy, but it is hardly impossible. Before we run the test of LITE therapy, we agree on the standards for improvement in patient condition. Then we have someone else (who does not know whether the patient being examined is in the control or the experimental group) evaluate the patient’s condition. That is, the experiment must be double-blind: The patient does not know if she is in the control or the experimental group, and the evaluator does not know what group the patient is in. If any confirmation bias remains (and by having an independent evaluator, it should be minimized), then it will apply to both the control and the experimental groups, and thus the effects of the bias will be canceled out.

So if you’re taking a new “natural delight herbal energy boost and serenity enhancer,” and sure enough you notice that you have more energy and you are becoming more serene and less bothered by everyday annoyances; well, maybe this new natural herbal remedy really works. Or maybe it’s the placebo effect. Or maybe you are subject to confirmation bias: You are simply more attuned to the times when you have greater energy and greater serenity, and pay less heed to periods of exhaustion and anxiety.

The moral of this story is a simple one: Determining causality is a very difficult process. There are many ways that careful, honest experimenters can go wrong, and there are lots more ways that frauds and charlatans can fabricate what appear to be plausible causal claims. Merely observing that one event followed another, or that there is a correlation between two sets of phenomena, indicates that there is a possible causal link; but such observations fall far short of establishing causality.

Scientific Integrity, Scientific Cooperation, and Research Manipulation

The flourishing of science requires trust, trustworthiness, freedom, and openness. When science is at its best, the scientific enterprise is a special combination of both cooperative and adversarial critical thinking. Good scientific inquiry builds on a foundation of trust and openness. Jane’s research—whether in sociology, psychology, biology, physics, or pharmacology—builds upon and draws from the research of many others. Isaac Newton, one of the greatest scientists of all time, expressed this idea modestly and accurately: “If I have seen farther than others, it is because I stood on the shoulders of giants.” Without the work of Copernicus, Kepler, and Galileo, Newton could never have made his great discoveries. If their work had not been made public, or if their research had been falsified, then Newton could not have developed his famous laws of motion. And what is true of Newton’s work in physics is true for all scientists in all disciplines

Effective scientific research is a cooperative process that requires trust and openness. Research typically builds on the work of other researchers: when those research findings are hidden (because, for example, someone believes that the research results can be turned to a profit-making enterprise, and thus keeps the results secret in order to gain an economic advantage) scientific research is slowed; when the research results are flawed, or even purposefully distorted (as when a pharmaceutical company manipulates a study to make a drug seem more effective than it actually is), other researchers who rely on the honesty and accuracy of that research are led in the wrong direction. When published research indicates that a new type of drug may be a significant improvement in the treatment of a disease, not only are those who use the drug deceived, but the researchers who rely on that research to push their own research in this new and promising direction will have been led badly astray. Thus, effective scientific research requires a basic level of trust and cooperation.

Though cooperation is basic to successful scientific research, there is also a vital role for adversarial argument. In the sciences, all ideas and theories and hypotheses are open to challenge. If you start from the principle that humans were created by God (and that humans did not evolve through natural processes), and you hold that principle as an article of faith and refuse to consider challenges to that principle, then you may call your position “creation science,” but it is faith rather than science. The best scientific theories make bold claims and original testable predictions; and open themselves to strong challenge and possible refutation. A theory that makes very safe “predictions”—and which makes such careful or uncertain claims that it is protected from challenge—is not a useful scientific hypothesis; in fact, it may not be a scientific hypothesis at all. Science grows and develops through bold theories that invite challenge and criticism. A vital element of good science is the adversarial attempt to refute theories. Sometimes that effort succeeds, and a theory is rejected; indeed, most scientific theories are ultimately challenged and rejected and replaced, but we learn a great deal through the process of developing the theory and testing its implications (the phlogiston theory of combustion—a burning fire consumes phlogiston; paper is rich in phlogiston, and iron has little—was a valuable framework for the research leading to the discovery of oxygen and ultimately to our theory of gasses). Often the effort to refute a theory leads to major discoveries and theoretical improvements: For example, when the failure to detect a stellar parallax challenged the Copernican theory, astronomers began to take seriously the idea that the distance between the Earth and the stars is much greater than we had imagined. Only when a theory deals successfully with the most rigorous challenges scientists can devise do we accept the theory as true. Ideally, this essential adversarial process should remain cordial and cooperative; in fact, some of the most rigorous challenges to the theory may be devised by those who hope the theory can meet the challenges. Above all, it must remain open: when theories are treated as immune to challenge—as the Catholic Church treated Aristotelian physics and Ptolemaic astronomy theory from the Medieval period until well into the seventeenth century—scientific inquiry is stifled and scientific development is slowed or stopped. Sadly, the blocking—rather than welcoming—of challenges to scientific claims can still happen, and the results are still damaging. When a pharmaceutical company wins FDA approval for a drug, it must offer scientific evidence that the drug is reasonably safe and effective; and when the profits from sale of that drug may amount to billions of dollars, the pharmaceutical company may not be eager to consider scientific challenges to the efficacy or the safety of its major moneymaker. Vioxx—a drug developed by Merck to reduce pain, especially for arthritis sufferers, which had annual sales in excess of $2 billion—has been withdrawn from the market because it was found to increase the danger of hypertension, blood-clotting, strokes, and heart attacks. Evidence of the danger from Vioxx became available soon after the drug was released (one study found that patients on Vioxx were five times as likely to suffer heart attacks than were patients taking an alternative painkiller). But rather than encourage scientific inquiry that might threaten its profits, Merck exerted pressure to silence its critics: Dr. Gurkipal Singh, a Stanford University researcher, was told by Merck representatives that his career would be ruined if he continued challenging the safety of Vioxx. When an in-house study of aprotinin—a drug marketed by Bayer to control bleeding in surgical patients—indicated that the drug could cause kidney failure and congestive heart failure, the company suppressed the study. Eli Lilly hid a study showing that Prozac could cause suicide; GlaxoSmithKline did the same with its study showing that its antidepressant best-seller, Paxil, might increase the risk of suicide among children and adolescents. A GlaxoSmithKline drug, Avandia, is used in the treatment of diabetes; in 2006 its sales exceeded $3 billion. When research indicated that the drug increased the danger of heart attacks (in comparison with a competing drug that produced the same positive results without the increased risk), GlaxoSmithKline responded by attacking its scientific critics. Dr. John Buse, Professor of Medicine at the University of North Carolina, was threatened with lawsuits if he continued to raise questions about the safety of Avandia (and a threat of legal action by a multibillion dollar corporation with unlimited legal resources is a frightening threat indeed). A report by a bipartisan Senate committee—chaired by Montana Democrat Max Baucus and Iowa Republican Charles Grassley—found that in response to the research challenging the safety of Avandia, executives at GlaxoSmithKline:

Attempted to intimidate independent physicians, focused on strategies to minimize or misrepresent findings that Avandia may increase cardiovascular risk, and sought ways to downplay findings that a competing drug might reduce cardiovascular risk.

Perhaps the most notorious contemporary case of attacking—rather than welcoming—scientific criticism is the case of Dr. Nancy Olivieri, a researcher at the University of Toronto who did research on a drug produced by Apotex (used to treat a rare blood disorder). Dr. Olivieri discovered that the drug was not very effective, and that the study indicated the drug might have lethal side effects. Though warned by Apotex not to publish her results, Dr. Olivieri went ahead with publication. She was fired by the university (it was later revealed that Apotex was discussing with the university president a 12.7 million dollar donation to the university) amidst charges of wrongdoing; later she was cleared of all charges, and returned to her university position. Whatever the motive—whether protecting religious doctrine or protecting profits—the suppression of scientific challenges has a chilling effect on scientific inquiry.