November 19, 2018

Too Harsh An Assessment

Ed Yong’s take on the ManyLabs2 study. We think it’s too harsh an assessment. Certainly, this massive replication attempt provides a roadmap for identifying robust effects. Even in psychology. Another good write-up can be found here.

November 14, 2018

Good Advice on How to Conduct a Replication Study

Annette L. Brown provides useful and sensible advice about what not to do when you try to replicate someone else’s study.

For example: Don’t present, post or publish replication results without first sharing them with the original authors. Word.

November 8, 2018

Replicability Ranking of Eminent Social Psychologists

Uli Schimmack, of Replicability Index fame (and also a featured speaker at the BIBaP – BizLab 2018 workshop on questionable research practices, has computed a replicability ranking of eminent social psychologists. And why not? Given the prevalence of questionable research practices in social psychology in particular, some such ranking seems more useful than citation indices. Some surprises (positive and negative) there but see for yourself.

November 8, 2018

Pre-Registration is Not For the Birds

In Mother Jones, Kevin Drum identifies what he calls — sensationalistically — the Chart of the Decade.

It is actually a study that’s three years old. The authors collected every significant clinical study of drugs and dietary supplements for the treatment or prevention of cardiovascular disease between 1974 and 2012.

Prior to 2000, questionable research practices were fair game. 13 out of 22 studies (59 percent) showed significant benefits. Once pre-registration was required, only 2 out of 21 did. Surprise! Not!

October 9, 2018

James Heathers On The Wansink Story – How It Began And How It Has Ended, For Now

After the JAMA family of journals retracted six of Wansink’s articles in one fell swoop, and after Cornell announced his resignation, there was a flurry of comments and reflections. One of the most interesting ones, was this. James Heathers was one of the four musketeers to investigate selected pieces of Wansink’s oeuvre. Here James describes how he got involved initially. Shockful of interesting links. A very readable background story about the motives of the people who engineered Wansink’s downfall.

September 12, 2018

What Authorship Means

Ioannidis, himself a very prolific author, and his co-authors identify 9,000 authors published more than 72 papers (the equivalent of one paper every 5 days) in any one calendar year between 2000 and 2016, a figure that many would consider implausibly prolific. Lots of interesting details about where these authors live and what disciplines they work in. A fabulous illustration.

August 27, 2018

Another Day, More Replication Failures

An international team of researchers, mostly from economics and psychology and known as the Social Sciences Replication Project, published the results of their attempt to replicate 21 studies published in Science and Nature between 2010 and 2015. The results were in our view somewhat sobering: The researchers succeeded in only 13 of 21 cases and where they succeeded the effect size was about half. Arguably the most interesting finding was that the prediction market the researchers conducted in parallel predicted amazingly well which studies would fail and which would succeed. Good write-ups on this latter finding can be found here (Ed Yong in The Atlantic) and here (Gidi Naeve in The Neuroeconomist).

You might want to play this rather interesting guessing game yourself.

June 14, 2018

Retracted and Replaced for No Good Reason?

The Washington Post reported on a major study on the mediterranean diet having been retracted and replaced. Was there a good reason for it?

The lead author on the study told The Washington Post that the causal link is just as strong as the original report. Which poses the interesting question: If the original study was so problematic that the authors chose to withdraw it entirely, could the new one be trusted?

June 7, 2018

Deconstruction and Re-evaluation of (In)famous Experiments

Both, Zimbardo’s (in)famous Stanford Prison Experiment and the equally (in)famous Milgram Experiment have recently been deconstructed and re-evaluated. A good write-up about the developments pertaining to the former can be found here; a good write-up about the developments pertaining to the latter can be found here.

May 1, 2018

Thinking Clearly About Correlations and Causations

Julia Rohrer who was one of our featured speakers at last year’s BIBaP – BizLab workshop has two excellent recent papers worth a read. The first is on correlation and causation and the second illustrates the need for specification curves through a study of birth-order effects. Recommended.

April 17, 2018

An Upbeat Mood May Boost Your Paper’s Publicity

Well, maybe not.

April 7, 2018

Four Misconceptions About Statistical Power

Explained by Zad here

Misconception 1: Statistical power can only be increased with larger sample sizes

Misconception 2: You’ve reached enough statistical power, or your study is underpowered.

Misconception 3: The problem with low power is that you’ll miss true effects.

Misconception 4: Effect sizes and standard deviations from pilot studies should be used to calculate sample sizes for larger studies.

A related good read: Statistical tests, P values, confidence intervals, and power: a guide to misinterpretation

March 30, 2018

Keeping Science Honest, One Whistle Blown at a Time

Wherein whistleblowers Josefin Sundin  and Fredrik Jutfelt explain how them blowing the whistle on the authors of a widely publicised sensationalist — (Fish prefer microplastics to live prey!) – but fraudulent research article published in Science in June 2016 led in their case ultimately to vindication and retraction of the paper, at huge private and professional consequences to them.

They conclude: “Ideally, whistle-blowing should not be necessary. The scientific community must enforce a culture of honesty. Sometimes that takes courage.”

True fact. But it should not have to.

March 28, 2018

How to Publish Statistically Insignificant Results in Economics

MIT economist Alberto Abadie makes the case that statistically insignificant results are at least as interesting as significant ones. One of Abadie’s key points (in a deeply reductive nutshell) is that results are interesting if they change what we believe (or “update our priors”). With most public policy interventions, there is no reason that the expected impact would be zero. So there is no reason that the only finding that should change our beliefs is a non-zero finding. This is a very readable write-up about the Abadie paper (less readable) by the Development Impact bloggers at the World Bank.

March 20, 2018

How (and Whether) to Teach Undergraduates About the Replication Crisis in Psychological Science

“We developed and validated a 1-hr lecture communicating issues surrounding the replication crisis and current recommendations to increase reproducibility. Pre- and post-lecture surveys suggest that the lecture serves as an excellent pedagogical tool. Following the lecture, students trusted psychological studies slightly less but saw greater similarities between psychology and natural science fields. We discuss challenges for instructors taking the initiative to communicate these issues to undergraduates in an evenhanded way.” (from the abstract).

March 19, 2018

Big Booze, like Big Banks and Other Big Business, is Relentless in Its Pursuit of Profit

HealthNewsReview.Org (now also featured on our useful links page) summarizes a report by the New York Times that suggests that lead researchers on a $100 million NIH study of the effects of moderate alcohol consumption had extensive discussions with the alcohol industry prior to securing the sponsorship and related reports. Apparently the NIH’s standards are slipping. The alcohol industry has also tried to buy favourable reporting by journalists. Which makes sense: “After all, if you’re going to invest $100 million in a study, wouldn’t it make sense to cultivate journalists to help put a nice shine on the results?”

March 9, 2018

How to Make Replication The Norm

The enquiring minds of Paul Gertler, Sebastian Galiani, and Mauricio Romero wanted to know. Focusing on economics, political science, sociology and psychology, in which ready access to raw data and software code are crucial to replication efforts, they surveyed deficiencies in the current system and propose reforms that can both encourage and reinforce better behaviour — a system in which authors feel that replication of software code is both probable and fair, and in which less time and effort is required for replication. Food for thought.

Here you can find the World Development Report 2015: Mind, Society, and Behavior.

And here is a critical review of it by one of our own.

March 8, 2018

Is There or Is There Not (A Reproducibility Crisis)?

Somewhat surprisingly (given his own contributions over the last ten years or so), Daniele Fanelli suggests in a recent piece in PNAS that the “narrative of crisis” is mistaken, and that “a narrative of epochal changes and empowerment of scientists would be more accurate, inspiring, and compelling.”

We have our doubts. So do many others as the relevant discussion on the Facebook Methodological Discussion group shows.

 

February 24, 2018

The New Lancet Study about Antidepressants is Not Exactly News

A new Lancet study about antidepressants got lots of play in the press. One of our favorite curmudgeons, Neurosceptic, points out that it tells us very little that we didn’t already know, and it has a number of limitations: “The media reaction to the paper is frankly bananas.”

February 20, 2018

Brembs on Prestigious Science Journals: Do They Live up to Their Reputations?

Brembs, well-known to the readers of this site through his blog, has just published a review of the evidence that speaks to the issue of the reliability (trustworthiness) of prestigious science journals. He comes to the somewhat distressing conclusion that these journals can’t be trusted any more than lower-ranked journals. In fact, the evidence seems to adjust that science published in lower level journals is more reliable. You read that right.  

February 19, 2018

Introducing Advances in Methods and Practices in Psychological Science

Daniel Simons introduces new journal Advances in Methods and Practices in Psychological Science (AMPPS) which is designed to foster discussions of, and advances in, practices, research design, and statistical methods.

February 18, 2018

Gelman on the Replication Crisis and Underpowered Studies: What Have We Learned Since 2004 (or 1984, or 1964)?

The man behind the Statistical Modeling, Causal Inference, and Social Science website, Andrew Gelman, takes a look back to consider what we have learned from critiques way back when that sound very modern. Apparently a little bit. 

February 10, 2018

RCTs are Not Always the Gold Standard: How We Figured Out that Smoking Causes Cancer

A very good discussion of the limits of RCTs and what it takes to establish causality here.

(We have added the source of this article to our Useful Links page because it has lots to offer.)

February 1, 2018

Replication is Not Enough: The Case for “Triangulation”

Mufano and Smith on why replication is not enough and why a problem has to be attacked in several ways.

January 31, 2018

More on the Cornell Food and Brand Lab Situation

Wansink’s former students are not amused. And who can blame them?

Meanwhile… Cornell is still belaboring its internal investigation “in compliance with our internal policies and any external regulations that may apply”. Some scholars are not impressed.

January 18, 2018

Replication Studies: A Report from the Royal Netherlands Academy of Arts and Sciences

The Dutch Academy of Arts and Sciences just published the report of a committee that was charged to identify, and address, the problems that seem to beset in particular Dutch psychologists (cue Stapel and others). Eric-Jan Wagenmakers was a member of that committee and provides a useful summary of the results of its findings.

January 15, 2018

The Ultimate Reading List for a Graduate Course on Reproducibility and Replicability, For Now

Brent Roberts and Dan Simons have compiled, from the hard labors of many other people, an up-to-date and fairly complete reading list for their current 2018 graduate course on Reproducibility and Replicability, focusing on readings that 1) identified the reasons for the current crisis, and 2) provide ways to fix the problems.

Subsections include:

Definitions of reproducibility & replicability

Houston, we have a problem (and by “we” we mean everyone, not just psychologists or social psychologists)

The problems that plague us have been plaguing us for a very long time

The problems that plague us: low power

The problems that plague us: selective publication; bias against the null

The problems that plague us: procedural overfitting

The problems that plague us: quality control  

NHST, P-values, and the like

Preregistration

Power and power analysis

On Replication

Open Science

Complexities in data availability

Informational value of existing research

Solutions

January 5, 2018

Progress Assessed

The editor of Science, Jeremy Berg, assesses the progress on reproducibility and is hopeful:

“Over the past year, we have retracted three papers previously published in Science. The circumstances of these retractions highlight some of the challenges connected to reproducibility policies. In one case, the authors failed to comply with an agreement to post the data underlying their study. Subsequent investigations concluded that one of the authors did not conduct the experiments as described and fabricated data. Here, the lack of compliance with the data-posting policy was associated with a much deeper issue and highlights one of the benefits of policies regarding data transparency. In a second case, some of the authors of a paper requested retraction after they could not reproduce the previously published results. Because all authors of the original paper did not agree with this conclusion, they decided to attempt additional experiments to try to resolve the issues. These reproducibility experiments did not conclusively confirm the original results, and the editors agreed that the paper should be retracted. This case again reveals some of the subtlety associated with reproducibility. In the final case, the authors retracted a paper over extensive and incompletely described variations in image processing. This emphasizes the importance of accurately presented primary data.”

January 5, 2018

Should the Bem Feeling-the-Future Article (JPSP 2011) be Retracted?

Uli Schimmack has done some superb data sleuthing and, based on it, makes a fairly persuasive case for this article (which in a sense involuntarily started the replicability revolution in psychological science) to be retracted.

A very good read that will teach you, by example, more about questionable research practices than pretty much anything else. That someone like Bem would make the comments attributed to him is hard to believe.

December 18, 2017

The Next Stapel?

Nick Brown, one of the researchers who was behind the questions raised about the Cornell Brand and Food Lab (see here and here), finally has raised questions about another matter. After having for two years tried to get answers from Nicolas Guéguen, of the Université Bretagne-Sud in France, on numerous papers of that researcher that had obvious problems, he posted on his blog a summary of his findings and questions. Good questions all in all we can see.

One also wonders how any respectable journal would publish some of these papers?:

 “As well as the articles we have blogged about, he has published research on such vital topics as whether women with larger breasts get more invitations to dance in nightclubs (they do), whether women are more likely to give their phone number to a man if asked while walking near a flower shop (they are), and whether a male bus driver is more likely to let a woman (but not a man) ride the bus for free if she touches him (we’ll let you guess the answer to this one).”

Why the next Stapel? See here.

December 1, 2017

The Health Benefits of Volunteering May Not Be What They Have Been Made Out to Be

Leif Nelson, one of the three musketeers behind Data Colada has a good piece on three alleged scientific findings that come with volunteering, other than the warm glow. He concludes:

“I see three findings, all of which are intriguing to consider, but none of which are particularly persuasive. The journalist, who presumably has been unable to read all of the original sources, is reduced to reporting their claims. The readers, who are even more removed, take the journalist’s claims at face value: ‘if I volunteer then I will walk around better, lower my blood pressure, and live longer. Sweet.'”

November 30, 2017

Failed

Uli Schimmack has produced an impressive quantitative review of the “evidence” in Bargh’s book Before you know it: The unconscious reasons we do what we do.

Concludes he: “There is no clear criterion for inadequate replicability, but Tversky and Kahneman (1971) suggested a minimum of 50%.  Professors are also used to give students who scored below 50% on a test an F.  So, I decided to use the grading scheme at my university as a grading scheme for replicability scores.  So, the overall score for the replicability of studies cited by Bargh to support the ideas in his book is F.”

November 29, 2017

Five (Out of a Zillion) Ways to Fix Statistics

Nature asked influential statisticians to recommend one change to improve science.

Relatedly, one of the authors (Gelman) has really had it with all that talk about false positives, false negatives, false discoveries, etc.

Update: December 5, 2017

Ed Hagen comments on these five comments and stresses that, before better statistics can be effective, it is imperative to change the incentives for researchers to use them. Specifically, “changing the incentives to reward high quality studies rather than sexy results would have enormously positive effects for science.” He makes the case for pre-registration of hypotheses and statistical tests.

November 28, 2017

d=2.44? Too Good to be True?

.. so men might not be less likely to help a woman who has her hair tied up in a ponytail or a bun after all. Which right there restores our faith in mankind.

November 23, 2017

The Cornell Food and Brand Lab Saga Continues

No, the magic number is not 42, apparently it is 770.

November 12, 2017

Changing the Default p-Value? Why Stop There?

In the wake of recent proposals to (not) change the p-value threshold, McShane, Gal, Gelman, Robert, and Tackett recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making.

That’s of course not a particularly new idea: Gigerenzer, Leamer, Ziliak & McCloskey, and Hubbard have promoted it for decades (something the present authors seem to either not know or prefer not to acknowledge) but it is worth being recalled.

November 10, 2017

The Cuddy Saga Continues

Following developments here, here, here, here, here, here and here, Cuddy now claims it was all a great misunderstanding and that power posing effects are about felt power (i.e. mere thinking and feeling) rather than subsequent choices. An interesting refocusing of the narrative…

P-curving A More Comprehensive Body of Research on Postural Feedback Reveals Clear Evidential Value For “Power Posing” Effects: Reply to Simmons and Simonsohn

Update: December 6, 2017

In Data Colada,  Joe Simmons, Leif Nelson, and Uri Simonsohn take on this analysis. You will not be surprised that they come to a different conclusion.

October 25, 2017

Replicability and Reproducibility Discussion in Economics

The Economic Journal (one of the leading economics journals) has a Feature ( a collection of articles) on the replicability of economic science. Of particular noteworthiness is the article by Ioannidis, Doucouliagos, and Stanley titled The Power of Bias in Economics Research.

If you have any illusions about the sorry state of the economics research, the abstract will abuse you of it:

“We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.”

Chris Doucouliagos is one of the key speakers at our 2017 workshop.

October 25, 2017

Making Replication Mainstream

Behavioural and Brain Sciences (the journal) has accepted a new target article by Rolf Zwaan and colleagues on an important issue. Comments are currently being solicited.

Rolf was one of the speakers at our 2015 workshop.

October 24, 2017

Criticising a Scientist’s Work Isn’t (Sexist) Bullying. It’s Science.

Simine Vazire reflects on the accusations that followed the New York Times piece on Amy Cuddy. Says she …

“Cuddy’s story is an important story to tell: It is a story of a woman living in a misogynistic society, having to put up with internet bullies … . But it is also a story of a woman experiencing completely appropriate scientific criticism of a finding she published. Conflating those issues, and the people delivering the “attacks,” does a disservice to the fight for gender equality, and it does a disservice to science.”

October 19, 2017

Did Power-Posing Guru Amy Cuddy Deserve her Public Shaming?

In the wake of The New York Times piece on Amy Cuddy, Daniel Engber investigates the accusation that she was dragged through the mud because she is a woman.

October 19, 2017

Wansink’s Cornell Food and Brand Lab Saga, Continued

Last month yet another flawed study by  Wansink’s lab was retracted (one of about 50 currently questioned, under review, or already retracted for good), only to be replaced by a corrected study. Unfortunately, this corrected study itself apparently needs also correction. How many rounds will it go? We wonder, too.

Relatedly,  BuzzFeed has provided email excerpts from Wansink’s attempts to save his reputation; predictably they do not really help his case. You can read about it here.

October 18, 2017

Power Poses, Revisited

The New York Times’s Susan Dominus has written a long piece on When the Revolution Came for Amy Cuddy. The piece was actually as much about her, as it was about her collaborator (Carney) on the original power poses paper, Simmons and Simonsohn (who critiqued it), and the culture that led to it and which is currently swept away. A good, if overly sympathetic look at the travails in which Cuddy now finds herself.

Relatedly, Gelman (who previously has commented repeatedly on the power poses controversy) published on the same day a reflection on the NYT piece which is worth a read, as is the extensive commentary that follows it (as of October 21 more than 100 pages if you were to print it out).

October 9, 2017

You Always Wanted to do This, So Here is Your Chance

Daniel Lakens now has a course on offer on Coursera on Improving your statistical inferences 

Recommended.

October 8, 2017

(Actually an article from last year which we have become aware of only now)

Science in the Age of Selfies

Lots of eminently citable quotes in there which capture the essence of what is becoming, or arguably has become, a serious problem: “Albert Einstein remarked that “an academic career, in which a person is forced to produce scientific writings in great amounts, creates a danger of intellectual superficiality”. True fact.

See also: Simine Vazire’s blog entry below.

October 1, 2017

Can We Really Not Avoid Being Blinded By Shiny Results?

Simine Vazire has some food for thought.

September 28, 2017

Cornell Food and Brand Lab Scandal: Update

BuzzFeed News has a very useful update on the Wansink food lab story; no words are minced. See here.

September 27, 2017

Gelman Has Sympathy for Big-Shot Psychologist That He Skewered Earlier

The beginning of a beautiful new friendship? Find out here.

September 26, 2017

James Coyne Doubles Down on His Two Earlier Power Pose Blog Entries, Taking No Prisoners in Typical Coyne Style

The titles of his two blog entries are, as usually, descriptive. Part 1 is titled Demonstrating that replication initiatives won’t salvage the trustworthiness of psychology and Part 2 is titled Could early career investigators participating in replication initiatives hurt their advancement?.

See whether you agree.

July 31, 2017

Changing the Default p-Value?

A group of mostly fairly well known economists, psychologists, and others interested in the replication crisis of the social sciences have proposed to change the default p-value threshold for statistical threshold for statistical significance for claims of new discoveries from 0.05 to 0.005. Much hilarity ensued.

See, for example, herehere and here.

July 22, 2017

Should Journals Be Responsible for Reproducibility?

The American Journal of Political Science, one of the top journals in the field, wants to know. Very seriously. A good write-up from Inside Higher Ed here.

July 21, 2017

Another Day, Another Challenge.

Our friends from the Open Science Framework have issued a pre-registration challenge:

“Preregistration increases the credibility of hypothesis testing by confirming in advance what will be analyzed and reported. For the Preregistration Challenge, one thousand researchers will win $1,000 each for publishing results of preregistered research.”

How tempting is that? We will soon find out.

The Leaderboard for the Preregistration Challenge can be found here: Take a look.

The project is closely related to the Replicability Index which we recommend on our resources page.

July 8, 2017

Psychologist Disputes His Findings Won’t Replicate

Andrew Gelman recently posted a comment on a new controversy involving findings by German psychologist Fritz Strack who found quite a while ago that holding a pen in your mouth forces you to smile. This very famous result has recently repeatedly failed to replicate. A comment by Strack on the ensuing controversy led to Gelman’s comment and extended commentary on his blog as well as on various other discussion groups such as the Facebook Psychological Methods discussion group.

Well known philosopher of Science Deborah G. Mayo has also chipped in on this controversy.

June 28, 2017

Power Poses Have No Lasting Power

The relatively new journal Comprehensive Results in Social Psychology just published an issue on power poses.

Going back to work done by Dana Carney, Amy Cuddy, and Andy Yap a few years back, an emergent literature on power poses originally suggested that nonverbal expressions of power affect people’s feelings, behaviors, and hormone levels. In particular, they claimed that adopting body postures associated with dominance and power (“power posing”) can increase testosterone, decrease cortisol, increase appetite for risk, and cause better performance in job interviews. Subsequent attempts to replicate the effects failed and last year Dana Carney made it clear that she did not think there was any such effect and that the original study that started it all was methodologically flawed.

Carney was one of the special-issue editors of Comprehensive Results in Social Psychology’s just published issue on power poses. This issue is of interest both because of its results (no power poses effects) and its methodology (pre-registered replication studies).

Quite some discussion of this issue did also occur on the Facebook Psychological Methods discussion group.

On the PLOS Blogs James Coyne has, in a two part series (see here and here), also usefully reflected on this controversy.

1 June 2017

Quantity Uncertainty Erodes Trust in Science

Of interest to those interested in the recent replicability and credibility crisis in the social sciences is this article outlining the inability of consumers of science to adequately evaluate the strength of scientific studies without appropriate transparency.

31 May 2017

Cornell Food and Brand Lab Scandal

A recent scandal that has played out in the blog-sphere (and even the popular press) has been the questionable on-goings in the Cornell Food and Brand Lab. Here is a recent summary of the controversy and the important questions about questionable research practices it poses.

Here and here are more relevant discussions.

31 May 2017

Priming Research Train Wreck Debate

In a blog entry dated 2 February 2017, Ulrich Schimmack, Moritz Heene, and Kamini Kesavan wrote that the “priming research” of Bargh and others featured in Kahneman’s book Thinking: Fast and Slow “is a train wreck” and should not be considered “as scientific evidence that subtle cues in their environment can have strong effects on their behavior outside their awareness” — had conceded that he placed too much faith in under-powered studies. Here  is a good start to read up on the debate.

28 May 2017

The Social Cost of Junk Science

In this blog entry, Andrew Gelman discusses what needs to be done to avoid it.

7 May 2017

On Eminence, Junk Science, and Blind Reviewing

In this blog entry, Andrew Gelman discusses these issues with Lee Jussim and Simine Vazire.

30 April 2017

Ujjval VYas on Evidence-Based Design and Ensuing Discussion

Andrew Gelman recently posted a comment by Ujjval Vyas as separate entry on his blog and has drawn, not surprisingly, huge reactions. Yyas pointed out that evidence-based design abounds with the kind of questionable research practices that makes Wansink look scrupulous! A fascinating read which you can find here.