The Seven Deadly Sins of Psychology Page 10
P(D) = P(D|H) × P(H) +P(D|~H) × P(~H)
In this equation, P(D|H) and P(H) are defined as above, P(D|~H) is the probability of a positive result if the patient doesn’t have Alzheimer’s disease (i.e., the false positive rate of the test), and P(~H) is the overall probability that the patient doesn’t have Alzheimer’s disease (calculated as 1–P(H) = 0.99). Let’s assume that that test in this case has a false positive rate of 5 percent, so that P(D|~H) = 0.05. The overall probability of a positive test result, P(D), can then be calculated as:
PD = 0.80 × 0.01 + 0.05 × 0.99 = 0.0575
We can now substitute this value of P(D) into Bayes’ theorem to estimate the probability that the patient truly has Alzheimer’s disease:
Are you surprised by this outcome? Despite the fact that the test has 80 percent sensitivity and just a 5 percent false positive rate, Bayes’ theorem tells us that there is still only a 13.9 percent chance that a patient with a positive test result has Alzheimer’s disease.51 This outcome violates our common intuitions; indeed, when posed with similar scenarios, even medical practitioners routinely overestimate the accuracy of such diagnostic tests.52
Bayes’ theorem is a valuable tool in medical screening because it counters our erroneous intuitions about probabilities, but calculating P(H|D) on its own isn’t particularly useful in psychological science. In most experimental settings we need to know more than the absolute probability of a given hypothesis given the data; instead, given the data that we have, we need to know how likely the experimental hypothesis (H1) is relative to the probability of a comparator (baseline) hypothesis. This baseline can be anything we choose, but in general it takes the form of the null hypothesis (H0) that the effect of interest is null or nonexistent in the population being studied. By calculating the ratio between P(D|H) for each of our hypotheses, H1 and H0, Bayes’ theorem allows us to decide which hypothesis is better supported by the evidence. The resulting value is known as a Bayes factor, B, and tells us the weight of evidence in favor of H1 over H0.
To calculate B we need to know the relative posterior probabilities of each hypothesis, P(H|D), and their relative prior probabilities P(H). To calculate the relative posterior probability of H1 vs. H0, we simply divide one Bayes’ theorem by the other:
This rather cumbersome equation can be restructured as:
Note that the values of P(D) here cancel out to 1, leaving us with the simpler form:
This equation, in turn, can be restructured to calculate the Bayes factor, B:
In other words, the Bayes factor (B) can be thought of as the ratio of the posterior probabilities, P(H|D), divided by the ratio of the prior probabilities P(H). This provides a simple estimate of how likely one hypothesis is given the evidence relative to the other, and therefore how much a rational observer should update his or her prior beliefs based on that data. A value of B = 1 is perfectly ambiguous and should lead to no change in prior beliefs, while B>1 indicates evidence in favor of H1, and B3 is judged to be “substantial or moderate evidence,” B>10 is “strong evidence,” B>30 is “very strong evidence,” and B>100 is “decisive evidence.” These ratios can be reversed to provide the same standards of evidence in favor of H0 over H1, with B
The rationale of Bayesian hypothesis testing is intuitive, but how exactly does it stand to improve the reliability of psychological science? Recall from chapter 2 that a key source of p-hacking in NHST lies in the violation of stopping rules—if we simply add participants to a study until p<.05 then we increase the odds that p value will dip below .05 by chance and thus lead to a type i error. bayesian analysis however is protected against this possibility: according likelihood principle more data consider accurate our estimation of reality becomes. researchers can continuously add their experiments until evidence clearly supports h1 or h0. approach not only eliminates violation stopping rules as source misleading information it also allows most efficient allocation limited scientific resources.>
A second major advantage of Bayesian hypothesis testing is that, in contrast to NHST, it allows direct estimation of the probability of H0 relative to H1. Recall that a nonsignificant p value allows the researcher only to fail to reject H0, and never to accept it or estimate its chances of being true. The biased nature of this inference is one major cause of publication bias because any negative finding (p>.05) using NHST is by definition inconclusive, which in turn allows authors and journals to feel justified in reporting only positive results. Bayesian inference, however, affords no special status to H0 and permits inferences about the relative probability of any prespecified hypothesis given the evidence: where NHST is limited to concluding an absence of evidence, a Bayes factor provides evidence of absence. Third, Bayesian tests require authors to transparently specify and justify their prior hypothesis, allowing the scientific community to see just how well grounded it is, and challenge it as necessary. Finally, Bayes factors can be combined and updated in light of new evidence.53 Bayesian testing thus encourages scientists to engage in programmatic research that is geared toward replications and accumulated truths rather than the “snapshot science” of NHST where a single study showing p<.05 can be impossible to refute.>
It must be emphasized that Bayesian hypothesis testing isn’t a cure-all for reliability. A Bayesian analysis can still be “B-hacked” by manipulating other researcher degrees of freedom such as the selective rejection of outliers or selection of dependent variables. No system of statistical inference is immune to cherry-picking, bias, or fraud. But combined with other initiatives, including study preregistration, Bayesian analysis is a major part of the solution.54
Adversarial collaborations. Too often in psychology, as in other sciences, ego overcomes reason, reducing the reliability and credibility of the results we produce. In the competitive world of reputation management, it is often more important to be seen winning an argument than to discover the truth. One proposed solution to this problem is the widespread implementation of so-called adversarial collaborations. An adversarial collaboration is one in which two or more researchers (or teams of researchers) with opposing beliefs jointly formulate a research design that will determine a “winner.” A typical formulation involves a team of researchers (proponents) who believe a certain hypothesis to be true, and a counter team (skeptics) who believe that no such effect exists. Having reached a consensus on the optimal design, both sides then implement the experiment and the results are combined at the end to adjudicate between the competing hypotheses. The advantage of this approach is that by agreeing to the rules of engagement in advance (and thus preregistering their hypotheses and analysis plans), both sides are forced to accept the outcome: there is no scope to claim, after the fact, that an undesirable result was the outcome of a suboptimal or flawed methodology used by the opponent.55
Improving reporting standards. One obvious way to improve the reliability of psychological science is to ensure that published methods and analyses are more easily reproducible by independent researchers. Recently a number of journals have moved toward more stringent reporting standards. In 2013, the journal Nature Neuroscience introduced a new methods checklist to increase research transparency, while at the same time eliminating words limits on method sections.56 In 2014, Psychological Science introduced a requirement for authors to complete a simple disclosure statement upon the submission of manuscripts:
1) Confirm that (a) the total number of excluded observations and (b) the reasons for making these exclusions have been reported in the Method section(s). [ ] If no observations were excluded, check here [ ].
2) Confirm that all independent variables or manipulatio
ns, whether successful or failed, have been reported in the Method section(s). [ ] If there were no independent variables or manipulations, as in the case of correlational research, check here [ ].
3) Confirm that all dependent variables or measures that were analyzed for this article’s target research question have been reported in the Methods section(s). [ ]
4) Confirm that (a) how sample size was determined and (b) your data-collection stopping rule have been reported in the Method section(s) [ ] and provide the page number(s) on which this information appears in your manuscript.57
Meanwhile, the Center for Open Science, led by psychologist Brian Nosek from the University of Virginia, has championed an initiative to offer badges for adherence to transparent research practices. These include sharing of primary research data, experimental materials, and study preregistration.58 Broad uptake and adherence to these new standards remains to be seen, but they are positive and vital moves that emphasize the pressing need to improve reliability and credibility of psychology.
Nudging career incentives. How can we motivate researchers to care more about reproducibility? One approach would be to develop a quantitative metric of reproducibility that counts in career advancement—a system where your score is improved by how often your work is independently replicated. In 2013, Adam Marcus and Ivan Oransky proposed such an index for journals.59 We will return to this possibility in chapter 8 and consider a concrete mechanism for how it could work in psychological science.
CHAPTER 4
The Sin of Data Hoarding
Code and data or it didn’t happen.
—Anon.
If passing aliens were to glance at terrestrial science they might conclude that data sharing is a standard global practice. Scanning the policy of the National Institutes of Health, one of the major US funding agencies, they would see that it “requires resource producers to release primary data along with an initial interpretation … to the appropriate public databases as soon as the data is verified. All data will be deposited to public databases … and these pre-publication data will be available for all to use.”1 Turning to the journal Science, one of the most prestigious peer-reviewed outlets on the planet, they would see that data sets “must be deposited in an approved database, and an accession number or a specific access address must be included in the published paper.”2 They would nod approvingly and move on.
But our visitors should have dug deeper. Had they studied the policies of the NIH and Science magazine more closely, they would have discovered that such rules apply only to specific kinds of scientific information, such as DNA and protein sequences, molecular structures, and climate data. Psychology isn’t subject to such requirements; on the contrary it is standard practice for psychology researchers to withhold data unless motivated to share out of self-interest or (rarely) when required to share by a higher authority. Unlike fields such as genomics or climatology, many psychologists effectively claim personal ownership over data—data that in the majority of cases is provided by volunteers through the use of public funding and resources.
Many psychologists consider sharing data only where doing so brings professional gains, such as working partnerships that lead to joint authorship of papers. As MIT neuroscientist Earl Miller has said: “I’m for sharing raw data. But as collaborations. Data is not a public water fountain.”3 Many psychologists and neuroscientists share Miller’s views. They fear that unfettered access to data would be to surrender intellectual property to rivals, allow undeserving competitors to benefit from our own hard work and invite unwelcome attention from critics. The widespread prevalence of questionable research practices further reduces the incentive to share. After all, in the world of major league science, who would want their mistakes or dubious decision making exposed to the scrutiny of a rival researcher or (worse) a professional statistician? By treating data as personal property and adopting a defensive agenda, psychology is guilty of our fourth major transgression: the sin of data hoarding.
The Untold Benefits of Data Sharing
Why do some areas of science enjoy a culture of transparency while others, like psychology, are so closed? Are psychologists more selfish than other scientists? Are they more sensitive to competition or criticism? Could it simply be that they are less aware of the gains that data sharing can bring both to wider science and individual scientists?
It is no exaggeration to say that data sharing carries enormous benefits for science. The most immediate plus is that it allows independent scientists to repeat the analyses reported by the authors, verifying that they were performed and reported appropriately. An unbiased perspective can be tremendously helpful in correcting honest mistakes and ensuring the robustness of the scientific record. Sharing study materials (e.g., computer code and stimuli) as well as data can help other scientists repeat experiments, thus aiding in the vital task of direct replication.
Data sharing also makes it easier to detect questionable research practices—were I to conduct 100 statistical tests on a data set and report only the one analysis that was statistically significant (an egregious form of p-hacking) then this would become obvious to anyone who tries the 99 analyses on the same data that “didn’t work.” Beyond the gray area of questionable practices, data sharing also enables the detection of fraud. In 2013, psychologist Uri Simonsohn published a simple and elegant statistical method for detecting fraudulent manipulation of data.4 As we will see, Simonsohn used this tool on the raw data associated with several published papers and in so doing unmasked multiple cases of data fabrication. His efforts in turn led to university investigations and resignations by established academics.
In the longer term, sharing data and materials at the point of publication prevents the permanent loss of information—over time, computers crash; and scientists change institutions, change contact details, and eventually die, and unshared data inevitably dies with them.5 This anecdote by psychologists Jelte Wicherts and Marjan Bakker from the University of Amsterdam paints an all-too-familiar picture in psychology: “One of us once requested data from a close colleague, who responded: ‘Sure I will send you those data, but it’s like seven computers ago, and so please allow me some time to hunt them down.’”6
Finally, publishing data allows independent scientists, now and in the future, to perform analyses that the original authors never imagined or had any interest in pursuing. Complex data sets can hold hidden gems, and aggregated data across many studies can allow researchers to perform vital meta-analysis.7
To all these benefits a critic might respond, “Sure, anyone can see how data sharing benefits science, but what about me as an individual researcher? What do I stand to gain?” Looking beyond the fact that all publicly funded scientists have an ethical duty to be as open and transparent as possible, one of the most selfish advantages is in citation rates. An analysis of over 10,000 gene expression microarray studies found that those that shared data attracted up to 30 percent more citations than those that did not, after controlling for a range of extraneous factors.8 Within the field of cancer research this relationship is even stronger, with open data associated with a 69 percent rise in citations.9 The potential career benefits of data sharing are therefore substantial.
Failure to Share
With so many good reasons to embrace transparency, how many psychologists actually lift the lid on their own data? Public data deposition in psychology is unquestionably rare—very few researchers make their data available to anyone beyond their immediate research team. Nevertheless, all major psychology journals require authors to retain data for several years and share it with readers on request. This raises the question of how many psychologists meet these requirements and share data when asked.
In 2006, Jelte Wicherts and colleagues decided to put this policy to the test by requesting data from 249 studies that appeared in four major journals published by the American Psychological Association (APA). Their experience was unsettling:
Unfortunately, 6 months later, afte
r writing more than 400 e-mails—and sending some corresponding authors detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes—we ended up with a meager 38 positive reactions and the actual data sets from 64 studies (25.7% of the total number of 249 data sets). This means that 73% of the authors did not share their data.10