Statcheck

{{short description|Software tool designed to detect statistical reporting errors}}

Statcheck is an R package designed to detect statistical errors in peer-reviewed psychology articles{{Cite journal |url=https://www.psychologicalscience.org/observer/bayesmed-and-statcheck |title=BayesMed and statcheck |issue=3 |last=Nuijten |first=Michèle B. |date=2017-02-28 |journal=Aps Observer |volume=30 |language=en-US |access-date=2018-10-18}} by searching papers for statistical results, redoing the calculations described in each paper, and comparing the two values to see if they match.{{Cite journal |last=Baker |first=Monya |date=2016-11-25 |title=Stat-checking software stirs up psychology |journal=Nature |language=en |volume=540 |issue=7631 |pages=151–152 |doi=10.1038/540151a |pmid=27905454 |issn=0028-0836|bibcode=2016Natur.540..151B |doi-access=free }} It takes advantage of the fact that psychological research papers tend to report their results in accordance with the guidelines published by the American Psychological Association (APA).{{Cite journal |last=Wren |first=Jonathan D. |date=2018-06-15 |title=Algorithmically outsourcing the detection of statistical errors and other problems |journal=The EMBO Journal |language=en |volume=37 |issue=12 |pages=e99651 |doi=10.15252/embj.201899651 |issn=0261-4189 |pmc=6003655 |pmid=29794111}} This leads to several disadvantages: it can only detect results reported completely and in exact accordance with the APA's guidelines,{{Cite journal |last1=Colombo |first1=Matteo |last2=Duev |first2=Georgi |last3=Nuijten |first3=Michèle B. |last4=Sprenger |first4=Jan |date=2018-04-12 |title=Statistical reporting inconsistencies in experimental philosophy |journal=PLOS ONE |volume=13 |issue=4 |pages=e0194360 |doi=10.1371/journal.pone.0194360 |issn=1932-6203 |pmc=5896892 |pmid=29649220|bibcode=2018PLoSO..1394360C |doi-access=free }} and it cannot detect statistics that are only included in tables in the paper.{{Cite journal |last1=van der Zee |first1=Tim |last2=Anaya |first2=Jordan |last3=Brown |first3=Nicholas J. L. |date=2017-07-10 |title=Statistical heartburn: an attempt to digest four pizza publications from the Cornell Food and Brand Lab |journal=BMC Nutrition |language=En |volume=3 |issue=1 |pages=54 |doi=10.1186/s40795-017-0167-x |pmid=32153834 |pmc=7050813 |issn=2055-0928 |doi-access=free }} Another limitation is that Statcheck cannot deal with statistical corrections to test statistics, like Greenhouse–Geisser or Bonferroni corrections, which actually make tests more conservative.{{cite arXiv |eprint= 1610.01010|title=Sources of false positives and false negatives in the Statcheck algorithm |last=Schmidt |first=Thomas | language=en-US|class=q-bio.QM |year=2016 }} Some journals have begun piloting Statcheck as part of their peer review process. Statcheck is free software published under the GNU GPL v3.{{Cite web|url=https://github.com/MicheleNuijten/statcheck/blob/master/DESCRIPTION|title = Statcheck/DESCRIPTION at master · MicheleNuijten/Statcheck|website = GitHub}}

Validity

In 2017, Statcheck's developers published a preprint paper concluding that the program accurately identified statistical errors over 95% of the time. This validity study comprised more than 1,000 hand-checked tests among which 5.00% turned out to be inconsistent.{{Cite journal |url= https://psyarxiv.com/tcxaj/ |title=The validity of the tool "Statcheck" in discovering statistical reporting inconsistencies |last=Nuijten |first=Michèle B. |journal=PsyArXiv | language=en-US}} The study found that Statcheck recognized 60% of all statistical tests. A reanalysis of these data found that if the program flagged a test as inconsistent, the decision was correct in 60.4% of cases. Reversely, if a test was truly inconsistent, Statcheck flagged it in an estimated 51.8% of cases (this estimate included the undetected tests and assumed that they had the same rate of inconsistencies as the detected tests). Overall, Statcheck's accuracy was 95.9%, half a percentage point higher than the chance level of 95.4% expected when all tests are simply taken at face value. Statcheck was conservatively biased (by about one standard deviation) against flagging tests.{{Cite journal |url= https://psyarxiv.com/hr6qy/ |title=Statcheck does not work: All the numbers |last=Schmidt |first=Thomas |journal=PsyArXiv | language=en-US}}

More recent research has used Statcheck on papers published in Canadian psychology journals, finding similar rates of statistical reporting errors as the original authors based on a 30-year sample of such articles. The same study also found many typographical errors in online versions of relatively old papers, and that correcting for these reduced the estimated percent of tests that were erroneously reported.{{Cite journal |last1=Green |first1=Christopher D. |last2=Abbas |first2=Sahir |last3=Belliveau |first3=Arlie |last4=Beribisky |first4=Nataly |last5=Davidson |first5=Ian J. |last6=DiGiovanni |first6=Julian |last7=Heidari |first7=Crystal |last8=Martin |first8=Shane M. |last9=Oosenbrug |first9=Eric |date=August 2018|title=Statcheck in Canada: What proportion of CPA journal articles contain errors in the reporting of p-values? |journal=Canadian Psychology |language=en |volume=59 |issue=3 |pages=203–210 |doi=10.1037/cap0000139 |s2cid=149813772 |issn=1878-7304|url=http://psyarxiv.com/39h47/ }}

History

Statcheck was first developed in 2015 by Michele Nuijten of Tilburg University and Sacha Epskamp of the University of Amsterdam.{{Cite web |url=https://www.vox.com/science-and-health/2016/9/30/13077658/statcheck-psychology-replication |title=A bot crawled thousands of studies looking for simple math errors. The results are concerning. |last=Resnick |first=Brian |date=2016-09-30 |website=Vox |access-date=2018-10-18}}{{Cite web |url=https://www.science.org/content/article/controversial-software-proving-surprisingly-accurate-spotting-errors-psychology-papers |title=Controversial software is proving surprisingly accurate at spotting errors in psychology papers |last=Chawla |first=Dalmeet Singh |date=2017-11-28 |website=Science |language=en |access-date=2018-10-18}} Later that year, Nuijten and her colleagues published a paper using Statcheck on over 30,000 psychology papers and reported that "half of all published psychology papers [...] contained at least one p-value that was inconsistent with its test".{{Cite journal |last1=Nuijten |first1=Michèle B. |last2=Hartgerink |first2=Chris H. J. |last3=van Assen |first3=Marcel A. L. M. |last4=Epskamp |first4=Sacha |last5=Wicherts |first5=Jelte M. |date=2015-10-23 |title=The prevalence of statistical reporting errors in psychology (1985–2013) |journal=Behavior Research Methods |language=en |volume=48 |issue=4 |pages=1205–1226 |doi=10.3758/s13428-015-0664-2 |issn=1554-3528 |pmc=5101263 |pmid=26497820}} The study was subsequently written up favorably in Nature.{{Cite journal |last=Baker |first=Monya |date=2015-10-28 |title=Smart software spots statistical errors in psychology papers |url=http://www.nature.com/news/smart-software-spots-statistical-errors-in-psychology-papers-1.18657 |journal=Nature |language=en |doi=10.1038/nature.2015.18657 |s2cid=187878096 |issn=1476-4687 |access-date=2018-10-19}} In 2016, Nuijten and Epskamp both received the Leamer-Rosenthal Prize for Open Social Science from the Berkeley Initiative for Transparency in the Social Sciences for creating Statcheck.{{Cite web |url=https://www.bitss.org/people/michele-nuijten/ |title=Michèle Nuijten |date=2016-12-16 |website=Berkeley Initiative for Transparency in the Social Sciences |language=en-US |access-date=2018-10-19}}

In 2016, Tilburg University researcher Chris Hartgerink used Statcheck to scan over 50,000 psychology papers and posted the results to PubPeer; they subsequently published the data they extracted from these papers in an article in the journal Data.{{Cite journal |last=Hartgerink |first=Chris |date=2016-09-23 |title=688,112 Statistical Results: Content Mining Psychology Articles for Statistical Test Results |journal=Data |language=en |volume=1 |issue=3 |pages=14 |doi=10.3390/data1030014|doi-access=free }} Hartgerink told Motherboard that "We're checking how reliable is the actual science being presented by science".{{Cite web |url=https://www.vice.com/en/article/scientists-are-worried-about-peer-review-by-algorithm-statcheck/ |title=Scientists Are Worried About 'Peer Review by Algorithm' |last=Buranyi |first=Stephen |date=2016-09-05 |website=Motherboard |language=en-us |access-date=2018-10-18}} They also told Vox that they intended to use Statcheck to perform a function similar to a spell checker software program. Hartgerink's action also sent email alerts to every researcher who had authored or co-authored a paper that it had flagged. These flaggings, and their posting on a public forum, proved controversial, prompting the German Psychological Society to issue a statement condemning this use of Statcheck.{{Cite news |url=https://www.theguardian.com/science/2017/feb/01/high-tech-war-on-science |title=The high-tech war on science fraud |last=Buranyi |first=Stephen |date=2017-02-01 |newspaper=The Guardian |language=en-GB |access-date=2018-10-18}} Psychologist Dorothy V.M. Bishop, who had two of her own papers flagged by Statcheck, criticized the program for publicly flagging many papers (including one of her own) despite not having found any statistical errors in it.{{Cite web |url=https://retractionwatch.com/2016/09/02/heres-why-more-than-50000-psychology-studies-are-about-to-have-pubpeer-entries/ |title=Here's why more than 50,000 psychology studies are about to have PubPeer entries |date=2016-09-02 |website=Retraction Watch |language=en-US |access-date=2018-10-18}} Other critics alleged that Statcheck had reported the presence of errors in papers that did not actually contain them, due to the tool's failure to correctly read statistics from certain papers.{{Cite journal |last=Stokstad |first=Erik |date=2018-09-21 |title=The truth squad |journal=Science |language=en |volume=361 |issue=6408 |pages=1189–1191 |doi=10.1126/science.361.6408.1189 |issn=0036-8075 |pmid=30237339|bibcode=2018Sci...361.1189S |s2cid=52309610 }}

Journals that have begun piloting the use of Statcheck as part of their peer review process include Psychological Science,{{Cite journal |last1=Freedman |first1=Leonard P. |last2=Venugopalan |first2=Gautham |last3=Wisman |first3=Rosann |date=2017-05-02 |title=Reproducibility2020: Progress and priorities |journal=F1000Research |volume=6 |pages=604 |doi=10.12688/f1000research.11334.1 |issn=2046-1402 |pmc=5461896 |pmid=28620458 |doi-access=free }} the Canadian Journal of Human Sexuality,{{Cite journal |last1=Sakaluk |first1=John K. |last2=Graham |first2=Cynthia A. |date=2017-11-17 |title=Promoting Transparent Reporting of Conflicts of Interests and Statistical Analyses at The Journal of Sex Research |journal=The Journal of Sex Research |language=en |volume=55 |issue=1 |pages=1–6 |doi=10.1080/00224499.2017.1395387 |pmid=29148841 |issn=0022-4499|doi-access=free }} and the Journal of Experimental Social Psychology.{{Cite book |url=https://www.journals.elsevier.com/journal-of-experimental-social-psychology/news/jesp-piloting-the-use-of-statcheck |title=JESP piloting the use of statcheck |website=Journal of Experimental Social Psychology |access-date=2018-10-19}} The open access publisher PsychOpen has also used it on all papers accepted for publication in their journals since 2017.{{Cite web |url=https://www.psychopen.eu/news/article/psychopen-uses-statcheck-tool-for-quality-check/ |title=PsychOpen uses Statcheck tool for quality check |date=2017-04-10 |website=PsychOpen |language=en |access-date=2018-10-23}}

See also

References

{{Reflist}}