The Seven Deadly Sins of Psychology Read online

Page 7


  Standardization of research practices. One reason p-hacking is so common is that arbitrary decisions are easy to justify. Even within a simple experiment, there are dozens of different analytic pathways that a researcher can take, all of which may have a precedent in the published literature and all of which may be considered defensible. This ambiguity enables researchers to pick and choose the most desirable outcome from a smorgasbord of statistical tests, reporting the one that “worked” as though it was the only analysis that was attempted. One solution to this problem is to apply more constraints on the range of acceptable approaches, applying institutional standards to such practices as outlier exclusion or the use of covariates.

  Scientists are generally resistant to such changes, particularly when a clear best practice fails to stand out from the crowd. Such standards can also be difficult to apply in emerging fields, such as functional brain imaging, owing to the rapid developments in methodology and analysis strategies. However, where standardization isn’t possible and any one of several arbitrary decisions is possible, there is a strong argument that researchers should report all of them and then summarize the robustness of the outcomes across all contingencies.

  Moving beyond the moral argument. Is p-hacking a form of fraud? Whether questionable research practices such as p-hacking and HARKing are fraudulent—or even on the same continuum as fraud—is controversial. Psychologist Dave Nussbaum from the University of Chicago has argued that clearly fraudulent behavior, such as data fabrication, is categorically different from questionable practices such as p-hacking because, with p-hacking, the intent of the researcher cannot be known.28 Nussbaum is right. As we have seen in this chapter, many cases of p-hacking and HARKing are likely to be unconscious acts of self-deception. Yet in cases where researchers deliberately p-hack in order to achieve statistical significance, Nussbaum agrees that such behavior is on the same continuum of extreme fraud.

  We will return to the issue of fraud, but for now we can ask: does it matter if p-hacking is fraudulent? Whether the sin of hidden flexibility is deliberate or an act of self-deception is a distraction from the goal of reforming science. Regardless of what lies in the minds of researchers, the effects of p-hacking are clear, populating the literature with post hoc hypotheses, false discoveries, and blind alleys. The solutions are also the same.

  CHAPTER 3

  The Sin of Unreliability

  And it’s this type of integrity, this kind of care not to fool

  yourself, that is missing to a large extent in much of the

  research in cargo cult science.

  —Richard Feynman, 1974

  Particles break light-speed limit,” announced Nature News.1 “Faster than light particles threaten Einstein,” declared Reuters.2 “Was Einstein wrong?” asked Time.3 So read the headlines in September 2011 when a team of physicists published evidence suggesting that subatomic particles called neutrinos could travel faster than the speed of light. If true, this discovery would revolutionize modern physics. Teams of scientists immediately began the task of repeating the experiment. By June 2012, three independent groups had failed to replicate the original result: the neutrinos in their experiments traveled at approximately the speed of light, just as predicted by special relativity. One month later, the original team reported that their findings were caused by a loose fiber-optic cable.

  The case of faster-than-light neutrinos may sound like an example of science going awry, but, in fact, it is just the opposite. After one team of scientists made what looked like an extraordinary discovery, the scientific community responded by attempting to reproduce the result. When those attempts failed, the original team scrutinized their experiment more closely, discovering a technical error that explained the anomaly. Science can never escape the risk of human error, but it can and must ensure that it self-corrects. Imagine for a moment what kind of physics we would have if faster-than-light neutrinos had simply been believed, their status as a discovery left unreplicated and unchallenged.

  Replication is the immune system of science, identifying false discoveries by testing whether other scientists can repeat them. Without replication, we have no way of knowing which discoveries are genuine and which are caused by technical error, researcher bias, fraud, or the play of chance. And if we don’t know which results are reliable, how can we generate meaningful theories?

  Unfortunately, as we saw earlier, the process of replication—so intrinsic to the scientific method—is largely ignored or distorted in psychology. Recall from chapter 1 that the claims of psychic precognition by Daryl Bem soared into print at the one of the most prestigious psychology journals in the world. Yet the crucial nonreplication by Stuart Richie and Chris French took much longer to appear and was initially rejected, on principle alone, by the same journal that published Bem’s findings. Instead of valuing the reproducibility of results, psychology has embraced a tabloid culture where novelty and interest-value are paramount and the truth is left begging. Psychology thus succumbs to our third major transgression: the sin of unreliability.

  Sources of Unreliability in Psychology

  Science generates an understanding of the natural world by using empirical evidence to reduce uncertainty. The hypothetico-deductive model of the scientific method, introduced in chapter 1, achieves this by systematically testing hypotheses borne from theory. Once the results of high-quality experiments are verified through direct (close) replication, the evidence can refine the theory in question, generating further hypotheses and honing our ability to understand and predict reality (see figure 3.1).

  Unfortunately, psychology fails to adhere to this philosophy, and nowhere is this culture expressed more blatantly than by the indifference—and, in many cases, hostility—toward direct replication. Yet the lack of replication isn’t the only reason psychological science faces a crisis of reliability. As we will see, other concerns include low statistical power, failure to disclose full methodological details, statistical fallacies, and the refusal to retract irreproducible findings from the literature. Together these problems not only threaten the truth-value of psychological evidence; they threaten the status of psychology as a science.

  Reason 1: Disregard for Direct Replication

  Direct replication is intrinsic to all sciences. During a direct replication, a researcher seeks to the test the repeatability of a previous finding by duplicating the methodology as exactly as possible.4 The importance of direct replication in science is reflected by the simple fact that empirical journal articles in all sciences include method sections. A method is supposed to provide researchers with all the information they would need to replicate the experimental “recipe.”

  Despite the clear importance of replication, we saw in chapter 1 how the academic culture in psychology places little emphasis in repeating the experimental methods of other psychologists. Such work is seen to lack innovation within a system that instead seeks to validate previous findings by conducting novel experiments that test a related (but different) idea using a different method: the approach referred to as “conceptual replication.” Although used widely in psychology, the term conceptual replication does not feature in the scientific method of other disciplines. In fact, the term itself is misleading because conceptual replications don’t actually replicate previous experiments; they instead assume (rather than test) the truth of the findings revealed in those experiments, infer the underlying cause, and then seek converging evidence for that inferred cause using an entirely different experimental procedure. Viewed within the framework of the H-D scientific method, this process can be thought of as extrapolating from a body of findings to refine theory and generate new hypotheses (see figure 3.1). As important as this step is, it depends first and foremost on the reliability of the underlying evidence base. To rely solely on extrapolation at the expense of direct replication is to build a house on sand.

  FIGURE 3.1. Science usually advances in steady increments rather than leaps. According to the deductive scientific meth
od, a hypothesis is generated from current theory, and an experiment is designed to test the hypothesis against one or more competitors. If the outcome of the experiment can be directly replicated to the community’s satisfaction, then the theory is refined, and a new hypothesis is formulated. The endpoint of this process of knowledge accumulation is the generation of scientific laws. Note that the refining of theoretical precision in no way implies that the original theoretical model remains intact or must be in any way sustained: the increase of knowledge can, and does, lead to theories being discarded altogether. Regardless, the theoretical framework always becomes more precise with the addition of new evidence.

  If direct replications are considered trivial and uninteresting then they should be seldom seen in the published literature. In his famous 1974 lecture “Cargo Cult Science,” Richard Feynman recounts his experience of suggesting to a psychology student that she perform a direct replication of a previous experiment before attempting a novel one:

  She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happened.5

  Remarkably, since 1947 there has been little analysis of how frequently psychologists replicate each other’s work even partially, let alone directly. In 2012, Matthew Makel, Jonathan Plucker, and Boyd Hegarty conducted the first systematic investigation of replication rates in psychology.6 They searched the top 100 psychology journals from 1900 to 2012 and found that only 1.57 percent of 321,411 articles mentioned a word beginning with “replicat*.” This may sound low enough but was already optimistically high. Within a randomly selected subsample of 500 of the 1.57 percent, only 342 actually reported some form of replication—and, of these, just 62 articles reported a direct replication of a previous experiment. On top of that, only 47 percent of replications within the subsample were produced by independent researchers. The implications of the Makel study are sobering: for every 1,000 papers published in psychology, only two will seek to directly replicate a previous experiment, and just one of those will be published by a different team to the original study.

  What happens to a scientific discipline when it abandons direct replication, as psychology has done? The immediate consequence, estimated by medical researcher John Ioannidis, is that up to 98 percent of published findings may be either unconfirmed genuine discoveries or unchallenged fallacies.7 Failure to attempt replications thus sabotages the ability for psychological science to self-correct: where physicists succeeded (quickly) at refuting and explaining the observation of faster-than-light neutrinos, psychologists would fail by not trying in the first place. This contempt for verification, in turn, has a profound effect on theory development, condemning the field to the publication of theoretical frameworks that can be neither confirmed nor falsified. In the words of psychologists Chris Ferguson and Moritz Heene, the outcome is a discipline that plays host to a “graveyard of undead theories,” each of them as immovable as they are impotent.8

  Why is psychology so implacably opposed to direct replication? The short answer is that we don’t know for certain—the aversion is so entrenched that the causes and effects have become obscured. The longer answer, as discussed in chapter 1, is that together with other life sciences, psychology has evolved an incentive structure that rewards empiricists who can produce novel, positive, eye-catching results that confirm the hypothesis and offer pithy interpretations. Pressure not to replicate is applied from all directions. At the supply end, funding agencies are loath to award money for merely repeating previous research—in the UK, even grant applications that describe original research (let alone replications) are often rejected for lack of novelty and innovation. Meanwhile, at the demand end, it is a struggle to publish direct replications in respected journals, whether successful or not. When such attempts succeed they are generally seen as boring and contributing little of value (“we already knew this.”; “what does this add?”), despite the fact that replications frame the certainty we can justify in prior discoveries. And when replications fail, defenders of the original work are liable to try and block publication or respond aggressively when such work is published.

  The negative attitude toward replication in psychology is epitomized by an incident that became known rather infamously as “Repligate.” In May 2014, the journal Social Psychology bucked the academic trend and reported an ambitious initiative to reproduce a series of influential psychological discoveries claimed since the 1950s.9 Many of the findings could not be replicated, and in most cases these nonreplications were met with cordial interactions between researchers. For instance, confronted with the news that one of his prior findings in social priming could not be reproduced, Dr. Eugene Caruso from the University of Chicago said, “This was certainly disappointing at a personal level. But when I take a broader perspective, it’s apparent that we can always learn something from a carefully designed and executed study.”

  Not all researchers were so gracious. Dr. Simone Schnall from the University of Cambridge argued that her work on social priming was treated unfairly and “defamed.” In a remarkable public statement, Schnall claimed that she was bullied by the researchers who sought (unsuccessfully) to replicate her findings and that the journal editors who agreed to publish the failed replications of her work behaved unethically.10 She wrote, “I feel like a criminal suspect who has no right to a defence and there is no way to win: The accusations that come with a “failed” replication can do great damage to my reputation, but if I challenge the findings I come across as a ‘sore loser.’”

  Schnall’s strong reaction to the failed replication of her own work provoked a mixed reaction from the psychological community. While many psychologists were bewildered by her response, a number of prominent US psychologists voiced support for her position. Dan Gilbert from Harvard University likened Schnall’s battle to the plight of Rosa Parks,11 and he referred to some psychologists who conducted or supported replications as “bullies,” “replication police,” “second stringers,” McCarthyists, and “god’s chosen soldiers in a great jihad.”12 Others accused the so-called replicators of being “Nazis,” “fascists,” and “mafia.” Rather than viewing replication as an intrinsic part of best scientific practice, Gilbert and his supporters framed it as a threat to the reputation of the (presumably brilliant) researchers who publish irreproducible findings, stifling their creativity and innovation.13

  For some psychologists, the reputational damage in such cases is grave—so grave that they believe we should limit the freedom of researchers to pursue replications. In the wake of Repligate, Nobel laureate Daniel Kahneman called for a new rule in which replication attempts should be “prohibited” unless the researchers conducting the replication consult beforehand with the authors of the original work.14 Kahneman said, “Authors, whose work and reputation are at stake, should have the right to participate as advisers in the replication of their research.” Why? Because the method sections published by psychology journals are too vague to provide a recipe that can be repeated by others. Kahneman argued that successfully reproducing original effects could depend on seemingly irrelevant factors—hidden secrets that only the original authors would know. “For example, experimental instructions are commonly paraphrased in the methods section, although their wording and even the font in which they are printed are known to be significant.”

  For many psychologists, Kahneman’s cure is worse than the disease. Andrew Wilson from Leeds Metropolitan University immediately rejected the suggestion of new regulations, writing: “If you can’t stand the replication heat, get out of the empirical kitchen because publishing your work means you think it’s ready for prime time, and if other people can’t make it work based on your published methods then that’s your problem and not theirs.”15 Are replications i
n other sciences ever regarded as an act of aggression, as Schnall and others suggest? Jon Butterworth, head of the Department of Physics and Astronomy at University College London, finds this view of replication completely alien. “Thinking someone’s result is interesting and important enough to be checked is more like flattery,” he told me. For Butterworth there is no question that the published methods of a scientific paper should be sufficient for trained specialists in the field to repeat experiments, without the need to learn unpublished secrets from the original authors. “Certainly no physicist I know would dare claim their result depended on hidden ‘craft.’”16

  Helen Czerski, broadcaster, physicist and oceanographer at University College London, offers a similar perspective. In her field, contacting the authors of a paper to find out how to replicate their results would be seen as odd. “You might have a chat with them along the way about the difficulties encountered and the issues associated with that research,” she said to me, “but you certainly wouldn’t ask them what secret method they used to come up with the results.” Czerski questions whether Kahneman’s proposed rules may breach research ethics. “My gut response is that asking that question is close to scientific misconduct, if you were asking solely for the purpose of increasing your chances of replication, rather than to learn more about the experimental issues associated with that test. The attitude in the fields I’ve worked in is definitely that you should be able to conduct your own test of a hypothesis, and that you shouldn’t need any guidance from the original authors.”17