Sunday, December 19, 2010

Science in trouble?

Because I like to throw around the latest trends in social science, I read this awesome article in last week's New Yorker with great interest and more than a bit of dismay. Lots of scientific experiments report significant effects that turn out later to be kinda bullshit. What could be more damaging to my worldview?

That was my first line of thinking, at least. The magnitude and pervasiveness of the "decline effect," by which certain "proven" trends decline precipitously over time, is troubling for those of us who put a lot of philosophical stock in the scientific method and see it as the basic, and most trustworthy, gateway to truth.

After thinking back to my own forays into the world of scientific research, I realized that the decline effect isn't terribly shocking after all. What people often fail to realize about science is the pervasive biases that could exist in most experiments--even the well designed ones--without ever showing up in the final draft of the scientific paper. Because the popular press, and even scientific papers, never report every detail of their experiments, these biases are easily lost on the reader, and only reveal themselves years later.

I saw the messy details of experimentation at work in three front-line research areas in college: a comparative cognition (monkey) lab, an unconscious cognition lab, and in my own senior research project in theoretical astrophysics. Each research setting exploited the potential ambiguities of the scientific process in a different way. Taken together, they are enough to convince me, in retrospect, that most areas of research have serious flaws. That's not to say that these labs and professors weren't doing valuable work, that their results are necessarily invalid, or that anyone was doing anything to rig any results, but that, as I'll try to explain, there were ample opportunities for subtle or unconscious manipulations.

For the monkey lab, I spent four weeks one summer doing an experiment on an island in Puerto Rico populated by hundreds of rhesus macaques. Because of the elaborate nature of the procedure for the experiment, it was rife with subtle but significant methodological problems. For instance, the other person involved in the experiment (the "camera person") who filmed the experimental trials was supposedly blind to the condition--she had to look away for a portion of the trials. Because she was blind to condition, she was responsible for "calling" unsuccessful trials when something went wrong. In reality, though, she wasn't really blind to condition because she had to protect me from onrushing monkeys with verbal warnings, and I had to call off lots of trials due to various logistical problems. Apparatus failure, experimenter error, and general monkey uncooperativeness were so common that we ended up throwing out about 80% of the trials, often already equipped with knowledge of the condition, the monkey being tested, and the likely--or actual--result of the trial. When such a high portion of the trials are discarded, blindness to the result and condition is crucial to keeping the remaining results untarnished, but strict compliance with this standard was simply impossible in our case. Our results were in the direction we expected, but not significant enough for publication.

In the ACME (Automaticity in Cognition, Motivation and Emotion) lab, I ran an experiment that was testing the "reverse Macbeth effect." Previous research had shown that people feel morally cleansed when they have an opportunity to physically cleanse themselves. Our experiment was a pilot study that explored the opposite: do people feel physically cleaner if they are permitted to morally cleanse themselves? The problem was that our study involved a lengthy survey, and in our statistical tests, we looked for correlations in the data that we weren't initially expecting, to find certain parts of the survey that turned up significant results. In other words, we gave people a a lot of questions in the hope that something in the data would turn out to be statistically significant. Not surprisingly, some stuff was, but not so convincingly. I'm not sure what the status of that experiment is now (I left the lab after the pilot study).

Then there was my astrophysics project. Where do I begin with that one! Two things, above all, left me completely disillusioned with the entire field of research by the time I was done. First, the literature is completely and utterly opaque (I promise I'm not just too dumb to understand it). Second, everyone's research is ridiculously co-dependent on previous models, of which there are usually dozens to choose from, among which there are widely varying results, and none of which is independently tested or verified. In other words, one's own calculations are so dependent on what other papers one chooses to reference that virtually any results could be could be handpicked by careful selection of previously published models and data sets.

My own project depended critically on empirical data regarding the mass-to-light ratio emitted from galaxy clusters, and on models that calculated the quantity of metals ejected by supernovae of different types and masses (don't ask). Looking into current literature, I encountered such widely varying data that it made the whole endeavor seem rather pointless. And what's worse, all the papers written on my topic (or closely related) would merely cite previous research, without any explanation, justification, or explication of the math involved. How is anyone supposed to figure out what's going on?!

So in the end, that New Yorker article really just convinces me that scientific research needs some reform. There's so much pressure to publish papers, it seeps into people and makes their work into a search for results, rather than a search for truth.


  1. I thought the coolest thing in the NYer article was that business about how new theories are often validated lots of times in the first couple years, then suddenly invalidated more and more, then all over the place... totally attributable to the trend in the theory (or as New York Magazine calls it, the backlash... and then the backlash to the backlash). Yikes.

    Here is an even more disturbing article from the atlantic on this topic, specifically related to medical research. (There's some overlap).

    And finally, if you ever get your act together to organize your RSS reader, I highly recommend Jonah Lehrer's blog for Wired. (He's the author of the New Yorker article).

  2. This comment has been removed by the author.

  3. Re: "(I promise I'm not just too dumb to understand it)," this article does a good job explaining how incompetent people are often too incompetent to know that they're incompetent:
    So I'm unconvinced.

    Those articles are scary as hell though. Without the time to go through studies individually, I feel, after reading these, that I can't trust any study outside my areas of expertise (math, some physics, being a douche). I'd love if there were some place that aggregates studies, which are then reviewed by statisticians and given quality rankings of some kind. It could go with the NYer article's idea of building a place for researchers to pre-publish their method, expected outcome, and requirements for confirmation.

    One bone to pick-- I guess the authors were trying to show that all branches of science are susceptible to these flaws, but I feel like the physicists who got gravity off by 2.5% just fracked up. That seems different from the more systematic problems that they're describing with, say, published medical research.

  4. couldn't not post this even though it's like 24 hrs outdated: GOOD GOD.

  5. Sam: Nice post. In applied economics there is a new norm, namely to publish articles with the null result. But I wonder if the new norm actually affects what gets published. Also in the econ dev, the new norm for randomized controlled trials of interventions (does school feeding affect learning; does paying bonuses to health workers who complete more anti-natal exams reduce infant mortality, etc) is to register all the details of the experiment in advance.. . .None of these of course deal with the decline effect: weird and worrisome-to-science in the first place decline effect.
    The second example in your post, however, just looked like fishing to me. . . .in economics the phenomenon of grad students and published scholars running millions of regressions to get the "right" results was exposed some years ago; now published articles have to have all sorts of robustness tests.
    So what about all the experiments Paul Bloom talks about (on babies) in Descartes' Baby -- such a good book but should he be reporting those results so merrily? Nancy B aka Mom

  6. If you liked Sam's post try this on why Science in Trouble is the point of Science with capital S