The Seven Deadly Sins of Psychology Read online

Page 2


  Bem himself realized that his results defied explanation and stressed the need for independent researchers to replicate his findings. Yet doing so proved more challenging than you might imagine. One replication attempt by Chris French and Stuart Ritchie showed no evidence whatsoever of precognition but was rejected by the same journal that published Bem’s paper. In this case the journal didn’t even bother to peer review French and Ritchie’s paper before rejecting it, explaining that it “does not publish replication studies, whether successful or unsuccessful.”3 This decision may sound bizarre, but, as we will see, contempt for replication is common in psychology compared with more established sciences. The most prominent psychology journals selectively publish findings that they consider to be original, novel, neat, and above all positive. This publication bias, also known as the “file-drawer effect,” means that studies that fail to show statistically significant effects, or that reproduce the work of others, have such low priority that they are effectively censored from the scientific record. They either end up in the file drawer or are never conducted in the first place.

  Publication bias is one form of what is arguably the most powerful fallacy in human reasoning: confirmation bias. When we fall prey to confirmation bias, we seek out and favor evidence that agrees with our existing beliefs, while at the same time ignoring or devaluing evidence that doesn’t. Confirmation bias corrupts psychological science in several ways. In its simplest form, it favors the publication of positive results—that is, hypothesis tests that reveal statistically significant differences or associations between conditions (e.g., A is greater than B; A is related to B, vs. A is the same as B; A is unrelated to B). More insidiously, it contrives a measure of scientific reproducibility in which it is possible to replicate but never falsify previous findings, and it encourages altering the hypotheses of experiments after the fact to “predict” unexpected outcomes. One of the most troubling aspects of psychology is that the academic community has refused to unanimously condemn such behavior. On the contrary, many psychologists acquiesce to these practices and even embrace them as survival skills in a culture where researchers must publish or perish.

  Within months of appearing in a top academic journal, Bem’s claims about precognition were having a powerful, albeit unintended, effect on the psychological community. Established methods and accepted publishing practices fell under renewed scrutiny for producing results that appear convincing but are almost certainly false. As psychologist Eric-Jan Wagenmakers and colleagues noted in a statistical demolition of Bem’s paper: “Our assessment suggests that something is deeply wrong with the way experimental psychologists design their studies and report their statistical results.”4 With these words, the storm had broken.

  A Brief History of the “Yes Man”

  To understand the different ways that bias influences psychological science, we need to take a step back and consider the historical origins and basic research on confirmation bias. Philosophers and scholars have long recognized the “yes man” of human reasoning. As early as the fifth century BC, the historian Thucydides noted words to the effect that “[w]hen a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason.” Similar sentiments were echoed by Dante, Bacon, and Tolstoy. By the mid-twentieth century, the question had evolved from one of philosophy to one of science, as psychologists devised ways to measure confirmation bias in controlled laboratory experiments.

  Since the mid-1950s, a convergence of studies has suggested that when people are faced with a set of observations (data) and a possible explanation (hypothesis), they favor tests of the hypothesis that seek to confirm it rather than falsify it. Formally, what this means is that people are biased toward estimating the probability of data if a particular hypothesis is true, p(data|hypothesis) rather than the opposite probability of it being false, p(data| ~hypothesis). In other words, people prefer to ask questions to which the answer is “yes,” ignoring the maxim of philosopher Georg Henrik von Wright that “no confirming instance of a law is a verifying instance, but … any disconfirming instance is a falsifying instance.”5

  Psychologist Peter Wason was one of the first researchers to provide laboratory evidence of confirmation bias. In one of several innovative experiments conducted in the 1960s and 1970s, he gave participants a sequence of numbers, such as 2-4-6, and asked them to figure out the rule that produced it (in this case: three numbers in increasing order of magnitude).6 Having formed a hypothesis, participants were then allowed to write down their own sequence, after which they were told whether their sequence was consistent or inconsistent with the actual rule. Wason found that participants showed a strong bias to test various hypotheses by confirming them, even when the outcome of doing so failed to eliminate plausible alternatives (such as three even numbers). Wason’s participants used this strategy despite being told in advance that “your aim is not simply to find numbers which conform to the rule, but to discover the rule itself.”

  Since then, many studies have explored the basis of confirmation bias in a range of laboratory-controlled situations. Perhaps the most famous of these is the ingenious Selection Task, which was also developed by Wason in 1968.7 The Selection Task works like this. Suppose I were to show you four cards on a table, labeled D, B, 3, and 7 (see figure 1.1). I tell you that if the card shows a letter on one side then it will have a number on the other side, and I provide you with a more specific rule (hypothesis) that may be true or false: “If there is a D on one side of any card, then there is a 3 on its other side.” Finally, I ask you to tell me which cards you would need to turn over in order to determine whether this rule is true or false. Leaving an informative card unturned or turning over an uninformative card (i.e., one that doesn’t test the rule) would be considered an incorrect response. Before reading further, take a moment and ask yourself, which cards would you choose and which would you avoid?

  FIGURE 1.1. Peter Wason’s Selection Task for measuring confirmation bias. Four cards are placed face down on a table. You’re told that if there is letter on one side then there will always be a number on the other side. Then you are given a specific hypothesis: If there is a D on one side then there is a 3 on its other side. Which cards would you turn over to test whether this hypothesis is true or false?

  If you chose D and avoided B then you’re in good company. Both responses are correct and are made by the majority of participants. Selecting D seeks to test the rule by confirming it, whereas avoiding B is correct because the flip side would be uninformative regardless of the outcome.

  Did you choose 3? Wason found that most participants did, even though 3 should be avoided. This is because if the flip side isn’t a D, we learn nothing—the rule states that cards with D on one side are paired a 3 on the other, not that D is the only letter to be paired with a 3 (drawing such a conclusion would be a logical fallacy known as “affirming the consequent”). And even if the flip side is a D then the outcome would be consistent with the rule but wouldn’t confirm it, for exactly the same reason.

  Finally, did you choose 7 or avoid it? Interestingly, Wason found that few participants selected 7, even though doing so is correct—in fact, it is just as correct as selecting D. If the flip side to 7 were discovered to be a D then the rule would be categorically disproven—a logical test of what’s known as the “contrapositive.” And herein lies the key result: the fact that most participants correctly select D but fail to select 7 provides evidence that people seek to test rules or hypotheses by confirming them rather than by falsifying them.

  Wason’s findings provided the first laboratory-controlled evidence of confirmation bias, but centuries of informal observations already pointed strongly to its existence. In a landmark review, psychologist Raymond Nickerson noted how confirmation bias dominated in the witchcraft trials of the middle ages.8 Many of these proceedings were a foregone conclusion, seeking only to obtain evidence that confirmed the gu
ilt of the accused. For instance, to test whether a person was a witch, the suspect would often be plunged into water with stones tied to her feet. If she rose then she would be proven a witch and burned at the stake. If she drowned then she was usually considered innocent or a witch of lesser power. Either way, being suspected of witchcraft was tantamount to a death sentence within a legal framework that sought only to confirm accusations. Similar biases are apparent in many aspects of modern life. Popular TV programs such as CSI fuel the impression that forensic science is bias-free and infallible, but in reality the field is plagued by confirmation bias.9 Even at the most highly regarded agencies in the world, forensic examiners can be biased toward interpreting evidence that confirms existing suspicions. Doing so can lead to wrongful convictions, even when evidence is based on harder data such as fingerprints and DNA tests.

  Confirmation bias also crops up in the world of science communication. For many years it was assumed that the key to more effective public communication of science was to fill the public’s lack of knowledge with facts—the so-called deficit model.10 More recently, however, this idea has been discredited because it fails to take into account the prior beliefs of the audience. The extent to which we assimilate new information about popular issues such as climate change, vaccines, or genetically modified foods is susceptible to a confirmation bias in which evidence that is consistent with our preconceptions is favored, while evidence that flies in the face of them is ignored or attacked. Because of this bias, simply handing people more facts doesn’t lead to more rational beliefs. The same problem is reflected in politics. In his landmark 2012 book, the Geek Manifesto, Mark Henderson laments the cherry-picking of evidence by politicians in order to reinforce a predetermined agenda. The resulting “policy-based evidence” is a perfect example of confirmation bias in practice and represents the antithesis of how science should be used in the formulation of evidence-based policy.

  If confirmation bias is so irrational and counterproductive, then why does it exist? Many different explanations have been suggested based on cognitive or motivational factors. Some researchers have argued that it reflects a fundamental limit of human cognition. According to this view, the fact that we have incomplete information about the world forces us to rely on the memories that are most easily retrieved (the so-called availability heuristic), and this reliance could fuel a bias toward what we think we already know. On the other hand, others have argued that confirmation bias is the consequence of an innate “positive-test strategy”—a term coined in 1987 by psychologists Joshua Klayman and Young-Won Ha.11 We already know that people find it easier to judge whether a positive statement is true or false (e.g., “there are apples in the basket”) compared to a negative one (“there are no apples in the basket”). Because judgments of presence are easier than judgments of absence, it could be that we prefer positive tests of reality over negative ones. By taking the easy road, this bias toward positive thoughts could lead us to wrongly accept evidence that agrees positively with our prior beliefs.

  Against this backdrop of explanations for why an irrational bias is so pervasive, psychologists Hugo Mercier and Dan Sperber have suggested that confirmation bias is in fact perfectly rational in a society where winning arguments is more important than establishing truths.12 Throughout our upbringing, we are taught to defend and justify the beliefs we hold, and less so to challenge them. By interpreting new information according to our existing preconceptions we boost our self-confidence and can argue more convincingly, which in turn increases our chances of being regarded as powerful and socially persuasive. This observation leads us to an obvious proposition: If human society is constructed so as to reward the act of winning rather than being correct, who would be surprised to find such incentives mirrored in scientific practices?

  Neophilia: When the Positive and New Trumps the Negative but True

  The core of any research psychologist’s career—and indeed many scientists in general—is the rate at which they publish empirical articles in high-quality peer-reviewed journals. Since the peer-review process is competitive (and sometimes extremely so), publishing in the most prominent journals equates to a form of “winning” in the academic game of life.

  Journal editors and reviewers assess submitted manuscripts on many grounds. They look for flaws in the experimental logic, the research methodology, and the analyses. They study the introduction to determine whether the hypotheses are appropriately grounded in previous research. They scrutinize the discussion to decide whether the paper’s conclusions are justified by the evidence. But reviewers do more than merely critique the rationale, methodology, and interpretation of a paper. They also study the results themselves. How important are they? How exciting? How much have we learned from this study? Is it a breakthrough? One of the central (and as we will see, lamentable) truths in psychology is that exciting positive results are a key factor in publishing—and often a requirement. The message to researchers is simple: if you want to win in academia, publish as many papers as possible in which you provide positive, novel results.

  What does it mean to find “positive” results? Positivity in this context doesn’t mean that the results are uplifting or good news—it refers to whether the researchers found a reliable difference in measurements, or a reliable relationship, between two or more study variables. For example, suppose you wanted to test the effect of a cognitive training intervention on the success of dieting in people trying to lose weight. First you conduct a literature review, and, based on previous studies, you decide that boosting people’s self-control might help. Armed with a good understanding of existing work, you design a study that includes two groups. The experimental group perform a computer task in which they are trained to respond to images of foods, but crucially, to refrain from responding to images of particular junk foods. They perform this task every day for six weeks, and you measure how much weight they lose by the end of the experiment. The control group does a similar task with the same images but responds to all of them—and you measure weight loss in that group as well.

  The null hypothesis (called “H0”) in this case is that there should be no difference in weight loss—your training intervention has no effect on whether people gain or lose weight. The alternative hypothesis (called “H1”) is that the training intervention should boost people’s ability to refrain from eating junk foods, and so the amount of weight loss should be greater in the treatment group compared with the control group. A positive result would be finding a statistically significant difference in weight loss between the groups (or in technical terms, “rejecting H0”), and a negative result would be failing to show any significant difference (or in other words, “failing to reject H0”). Note how I use the term “failing.” This language is key because, in our current academic culture, journals indeed regard such outcomes as scientific failures. Regardless of the fact that the rationale and methods are identical in each outcome, psychologists find negative results much harder to publish than positive results. This is because positive results are regarded by journals as reflecting a greater degree of scientific advance and interest to readers. As one journal editor said to me, “Some results are just more interesting and important than others. If I do a randomized trial on a novel intervention based on a long-shot and find no effect that is not a great leap forward. However, if the same study shows a huge benefit that is a more important finding.”

  This publication bias toward positive results also arises because of the nature of conventional statistical analyses in psychology. Using standard methods developed by Neyman and Pearson, positive results reject H0 in favor of the alternative hypothesis (H1). This statistical approach—called null hypothesis significance testing—estimates the probability (p) of an effect of the same or greater size being obtained if the null hypothesis were true. Crucially, it doesn’t estimate the probability of the null hypothesis itself being true: p values estimate the probability of a given effect or more extreme arising given the hypothesis,
rather than the probability of a particular hypothesis given the effect. This means that while a statistically significant result (by convention, p<.05 allows the researcher to reject h0 a statistically nonsignificant result>.05) doesn’t allow the researcher to accept H0. All the researcher can conclude from a statistically nonsignificant outcome is that H0 might be true, or that the data might be insensitive. The interpretation of statistically nonsignificant effects is therefore inherently inconclusive.

  Consider the thought process this creates in the minds of researchers. If we can’t test directly whether there is no difference between experimental conditions, then it makes little sense to design an experiment in which the null hypothesis would ever be the focus of interest. Instead, psychologists are trained to design experiments in which findings of interest would always be positive. This bias in experimental design, in turn, means that students in psychology enter their research careers reciting the mantra “Never predict the null hypothesis.” If researchers can never predict the null hypothesis, and if positive results are considered more interesting to journals than negative results, then the inevitable outcome is a bias in which the peer-reviewed literature is dominated by positive findings that reject H0 in favor of H1, and in which most of the negative or nonsignificant results remain unpublished. To ensure that they keep winning in the academic game, researchers are thus pushed into finding positive results that agree with their expectations—a mechanism that incentivizes and rewards confirmation bias.