The primary reason is that we have observed a significant increase in developers choosing not to participate in the study because they do not wish to work without AI, which likely biases downwards our estimate of AI-assisted speedup. We additionally believe there have been selection effects due to a lower pay rate (we reduced the pay from $150/hr to $50/hr), and that our measurements of time-spent on each task are unreliable for the fraction of developers who use multiple AI agents concurrently.
It was a controversial study, and it was a prime example of one that served as a Rorschach test for how you felt about AI. Skeptics of AI pointed to it as conclusive evidence that productivity gains were all smoke and mirrors. Proponents of AI had issues with methodology and pointed out lots of room for interpretation.
Now, METR is saying that they don’t trust their own methodology anymore due to a number of reasons, as quoted above, and are now estimating that the speed‑up in productivity for software developers is somewhere probably around 20% faster. But even then they claim that their data about the size of the increase is weak.
I think it’s really admirable that they are updating their methodology and making these claims because that study was one of the things that really, I think, rocketed them to the forefront of AI evaluation outside of the labs. I myself have linked to their graphs before and their graphs have gone viral for lots of reasons.
More of this, please.