Great post! As an applied economist I routinely look at the abstracts of NBER working papers, and I recall that I read with some attention the one by Toner-Rodgers.
I also skimmed through the paper itself, and I had three short thoughts (of which: two red flags):
1) too many researchers at this company;
2) too much work for one author (and I did not know he was that early-stage in his career;
3) I was envious at stuff done by MIT students (I did my PhD at LSE, which is not bad at all, but still...). Now I am less envious.
To be fair, I do know some absolutely cracked MIT students who probably could have done this work if the data were actually real. But ya, the number of researchers in the study stood out to me. If there were a company that hired >1k researchers on materials discovery alone, I feel like everyone I know would have been clamoring to get hired there.
I disagree. Toner-Rogers didn't fabricate this data. The AI that wrote the paper for him fabricated the data. It seems to fall well within the type of hallucination AIs generate when asked to produce data, especially when given the hypothesis the prompter wants confirmed, up front.
I suspect he had virtually nothing to do with the entire paper. He told an AI (probably ChatGPT) what he wanted and then tried to publish the result as if it was his own work.
Hmmm, I mean, it's quite likely that he used AI assistance extensively, but I think that the AI capabilities in mid-2024, when he would have written this, were not nearly capable enough to generate the entire paper without substantial assistance on his part (and they probably still aren't). It looks like he is quite adept at utilizing generative AI, which ironically supports his fake findings on the most competent researchers having more to gain from AI use.
I think it depends a little bit on what we mean by "entirely" and "substantial assistance." You're right that you can't just tell an AI "write me a paper about X" and expect to get anything decent. But you could do it in chunks.
A while back (a long while. Sorry, I don't remember enough to find a link) I read a paper that described a method for producing longer works. It was something like, ask the Ai to do an outline for a paper. Then ask it to develop a prompt for each section of the outline. Then feed it each of those prompts one at a time and merge the results.
I seem to recall that when I tried it out, the results were not usable. The sections were uneven and there was a lot of repetition. So, yes, it would take a lot of work to smooth all that over. But I haven't read the paper under discussion. It might also be uneven and repetitive. Don't know!
It seems very unlikely that a large company that would employ ~1000 material scientists would allow the release of any data about their employee's productivity or their research results.
Yeah that leaps to mind for sure. Helps when journal editors slow down and find some outside expertise. But also, MIT doctoral advisors have some ‘splaining to do.
Great analysis! Having all 1018 researchers all working on all 4 areas is a huge red flag. There's very little overlap between polymer scientists and metallurgists.
I gave the preprint to Claude 3.7 Sonnet +thinking, and no matter the prompts (including my notes on Stuart Ritchie's book Science Fictions) it didn't recognize any red flags, unless I started really steering it (giving contextual info of it being done by a first year grad student)
No, that's not how it works. And as Ben said, AI mid-2024 wasn't capable of generating this quality of paper (and I doubt that current AI would do particularly well at it)
That only strengthens the point that the results were based on other results, which are available online. Had they been different, I bet Claude would've called attention to it.
I mean, I hate SciMaGo rankings as much as the next person (and I'm actually currently working on a side-project for a better system for journal rankings) but apparently QJE clears for now:
I think it's likely that the author of the paper is, by now, a "former first year Ph.D." student.
It would be truly funny if someone at one of these companies had received an impertinent request from this student, asking for data for a class project, and the person fed falsified data to him just to screw with him.
I think it's almost certain that he extensively used generative AI in writing this manuscript (and seemed to have used it quite adeptly, tbh), but I think this fraud required a lot of creativity on his part, and that the data was probably faked with pretty deliberate instructions to the AI, rather than wholesale invented by the AI.
I’ll just point out that Nature’s journalistic coverage of this preprint at the time *did* request comment from at least one materials scientist, Robert Palgrave, who didn’t raise to the journalist the critiques noted here.
Right, but his concerns were around asking questions of the methods. They didn’t lead him to doubt the paper’s veracity.
His thread at the time concludes: “A fascinating paper, and clearly a huge amount of work. Very interesting and impressive how seemingly one student managed to conduct such a wide ranging study at what must be a major company.”
Not sure why you’re trying to die on this hill. The veracity of some of the results and claims are doubted in that thread, including that anything useful was found. And the author, Ben, specifically points that out above as well:
“And when the piece originally came out, he had an orthogonal, but also very valid set of reasons for being skeptical of the work (mostly due to the difficulty in defining the “novelty” of materials).”
Oh, I'm pointing this out because to my reading, this blog suggests that had a materials scientist read the preprint, they'd have spotted it was likely fraudulent.
("Probably a materials scientist who read the paper realized this was fraudulent but wasn’t able to get that view quickly to the economists who were actually reading and discussing the paper", the blog says).
The implication (to my reading) is that the likeliness of fraud would have been readily apparent to the real subject-matter experts here. And if journalists had only asked those real experts, the mess would have come to light earlier.
I'm responding by noting that this suggestion doesn't really have a basis in fact. Materials scientists *did* read the preprint - and *were* consulted by journalists - and they didn't spot it was likely fraudulent. Sure, they expressed useful concern about the work's methods and conclusions. Subject-matter experts always critique methods in preprints. But there was not a whisper about the possibility of fraud, the idea that literally the whole thing was fake.
In hindsight, red flags are there. But - as the blog says - hindsight is 20/20. It genuinely isn't so clear ahead of time. And I hope MIT is clearer about what happened in this affair, as it's still frustratingly opaque.
(By the way, I'd expect that a journal editor in any subsequent peer review process would have asked the author to confidentially send over details about the firm involved and get confirmation from the firm ... but perhaps economics journals wouldn't have done that).
I assume he wrote that beforehand, given his later statement (& taking him at his word), referring to Palgrave’s more recent post:
“(I promise I read his thread after writing the bulk of this blog post)”
It’s also quite possible Palgrave *did* have such suspicions, as he did initially make explicit note of several of the same observations re: the surprising meta-aspects: The company size, when the study started wrt recent AI developments, and that a student was somehow working alone on the set with the company.
However, it’s quite a difficult task to ask someone who’s simply been asked for comment on the science to indicate the paper is outright fraudulent. You would have to be very sure before even *suggesting* such a thing in a public forum, since, if you’re wrong, the damage done can be disastrous and to some degree irreversible. Just look how Ben here, even after knowing the results were fraudulent, still hedged on the data themselves.
I actually do wonder if this would have been uncovered in peer review at an Econ journal. Without a domain expert, it would be difficult to detect many of these problems, and with emails from “Corning Research,” he could have gotten past concerns re: the company and his access to the data.
My guess is that he ended up being just a little too ambitious for his own good. These results were so initially compelling as to receive national coverage, opening up the audience to many subject-matter experts, at least some of whom likely shared the same concerns as Ben and Palgrave & therefore probably reached out to MIT.
Had he instead made more modest claims that didn’t raise so much attention, this likely would have slid past review without too much difficulty. This suggests there are probably many such fraudulent works out there, made by people who kept their results and claims below the “OMG” level, if you will, and who therefore didn’t receive the same level of acute scrutiny by many independent researchers.
this might be the greatest postscript of all time
Thank you very much! Much of the credit should go to Will Wang, the twitter user who uncovered that WIPO complaint!
oh but also incredible work ben. huge props to you for this excellent post.
Great post! As an applied economist I routinely look at the abstracts of NBER working papers, and I recall that I read with some attention the one by Toner-Rodgers.
I also skimmed through the paper itself, and I had three short thoughts (of which: two red flags):
1) too many researchers at this company;
2) too much work for one author (and I did not know he was that early-stage in his career;
3) I was envious at stuff done by MIT students (I did my PhD at LSE, which is not bad at all, but still...). Now I am less envious.
To be fair, I do know some absolutely cracked MIT students who probably could have done this work if the data were actually real. But ya, the number of researchers in the study stood out to me. If there were a company that hired >1k researchers on materials discovery alone, I feel like everyone I know would have been clamoring to get hired there.
I call humblebrag.
MIT has now deleted Toner-Rodgers from their PhD student listing
https://economics.mit.edu/people/phd-students/aidan-toner-rodgers
I disagree. Toner-Rogers didn't fabricate this data. The AI that wrote the paper for him fabricated the data. It seems to fall well within the type of hallucination AIs generate when asked to produce data, especially when given the hypothesis the prompter wants confirmed, up front.
I suspect he had virtually nothing to do with the entire paper. He told an AI (probably ChatGPT) what he wanted and then tried to publish the result as if it was his own work.
Hmmm, I mean, it's quite likely that he used AI assistance extensively, but I think that the AI capabilities in mid-2024, when he would have written this, were not nearly capable enough to generate the entire paper without substantial assistance on his part (and they probably still aren't). It looks like he is quite adept at utilizing generative AI, which ironically supports his fake findings on the most competent researchers having more to gain from AI use.
I think it depends a little bit on what we mean by "entirely" and "substantial assistance." You're right that you can't just tell an AI "write me a paper about X" and expect to get anything decent. But you could do it in chunks.
A while back (a long while. Sorry, I don't remember enough to find a link) I read a paper that described a method for producing longer works. It was something like, ask the Ai to do an outline for a paper. Then ask it to develop a prompt for each section of the outline. Then feed it each of those prompts one at a time and merge the results.
I seem to recall that when I tried it out, the results were not usable. The sections were uneven and there was a lot of repetition. So, yes, it would take a lot of work to smooth all that over. But I haven't read the paper under discussion. It might also be uneven and repetitive. Don't know!
This makes sense because my first reaction was jeepers that amount of effort…. Why not just do real work?
The “no real effort” explanation pairs better with the outcome.
It seems very unlikely that a large company that would employ ~1000 material scientists would allow the release of any data about their employee's productivity or their research results.
The 1996 Sokal Hoax, revisited https://www.journals.uchicago.edu/doi/abs/10.1086/449049
Yeah that leaps to mind for sure. Helps when journal editors slow down and find some outside expertise. But also, MIT doctoral advisors have some ‘splaining to do.
https://econospeak.blogspot.com/2025/05/artificial-intelligence-creates-more.html
Autor never good at BS detection.
Great photo. The expression on the dog's face speaks for itself...
True about Autor, but you really have no idea why the lump of labor fallacy is a fallacy.
80 citations says otherwise. Not all agree with me, of course
https://scholar.google.ca/citations?view_op=view_citation&hl=en&user=k4xobtAAAAAJ&citation_for_view=k4xobtAAAAAJ:u5HHmVD_uO8C
Great analysis! Having all 1018 researchers all working on all 4 areas is a huge red flag. There's very little overlap between polymer scientists and metallurgists.
I gave the preprint to Claude 3.7 Sonnet +thinking, and no matter the prompts (including my notes on Stuart Ritchie's book Science Fictions) it didn't recognize any red flags, unless I started really steering it (giving contextual info of it being done by a first year grad student)
It's unlikely to diss it's own work.
No, that's not how it works. And as Ben said, AI mid-2024 wasn't capable of generating this quality of paper (and I doubt that current AI would do particularly well at it)
r/whooosh called.
That only strengthens the point that the results were based on other results, which are available online. Had they been different, I bet Claude would've called attention to it.
This is like if Icarus, rather than flying too close to the sun upon beeswax wings, launched himself towards the sun in a Saturn V rocket
"Toner-Rodgers submitted his paper to The Quarterly Journal of Economics, the top econ journal in the world."
Argh, AER.
I mean, I hate SciMaGo rankings as much as the next person (and I'm actually currently working on a side-project for a better system for journal rankings) but apparently QJE clears for now:
https://www.scimagojr.com/journalrank.php?category=2002
Fair!
I think it's likely that the author of the paper is, by now, a "former first year Ph.D." student.
It would be truly funny if someone at one of these companies had received an impertinent request from this student, asking for data for a class project, and the person fed falsified data to him just to screw with him.
Was the fraudulent stuff was generated with AI too? Or was AI just the hot theme chosen for the fraud?
I think it's almost certain that he extensively used generative AI in writing this manuscript (and seemed to have used it quite adeptly, tbh), but I think this fraud required a lot of creativity on his part, and that the data was probably faked with pretty deliberate instructions to the AI, rather than wholesale invented by the AI.
I’ll just point out that Nature’s journalistic coverage of this preprint at the time *did* request comment from at least one materials scientist, Robert Palgrave, who didn’t raise to the journalist the critiques noted here.
But, as pointed out in the post, that very materials scientist pointed out other concerns at the time in a twitter thread
https://x.com/robert_palgrave/status/1856273403595915397
Right, but his concerns were around asking questions of the methods. They didn’t lead him to doubt the paper’s veracity.
His thread at the time concludes: “A fascinating paper, and clearly a huge amount of work. Very interesting and impressive how seemingly one student managed to conduct such a wide ranging study at what must be a major company.”
Not sure why you’re trying to die on this hill. The veracity of some of the results and claims are doubted in that thread, including that anything useful was found. And the author, Ben, specifically points that out above as well:
“And when the piece originally came out, he had an orthogonal, but also very valid set of reasons for being skeptical of the work (mostly due to the difficulty in defining the “novelty” of materials).”
Oh, I'm pointing this out because to my reading, this blog suggests that had a materials scientist read the preprint, they'd have spotted it was likely fraudulent.
("Probably a materials scientist who read the paper realized this was fraudulent but wasn’t able to get that view quickly to the economists who were actually reading and discussing the paper", the blog says).
The implication (to my reading) is that the likeliness of fraud would have been readily apparent to the real subject-matter experts here. And if journalists had only asked those real experts, the mess would have come to light earlier.
I'm responding by noting that this suggestion doesn't really have a basis in fact. Materials scientists *did* read the preprint - and *were* consulted by journalists - and they didn't spot it was likely fraudulent. Sure, they expressed useful concern about the work's methods and conclusions. Subject-matter experts always critique methods in preprints. But there was not a whisper about the possibility of fraud, the idea that literally the whole thing was fake.
In hindsight, red flags are there. But - as the blog says - hindsight is 20/20. It genuinely isn't so clear ahead of time. And I hope MIT is clearer about what happened in this affair, as it's still frustratingly opaque.
(By the way, I'd expect that a journal editor in any subsequent peer review process would have asked the author to confidentially send over details about the firm involved and get confirmation from the firm ... but perhaps economics journals wouldn't have done that).
I assume he wrote that beforehand, given his later statement (& taking him at his word), referring to Palgrave’s more recent post:
“(I promise I read his thread after writing the bulk of this blog post)”
It’s also quite possible Palgrave *did* have such suspicions, as he did initially make explicit note of several of the same observations re: the surprising meta-aspects: The company size, when the study started wrt recent AI developments, and that a student was somehow working alone on the set with the company.
However, it’s quite a difficult task to ask someone who’s simply been asked for comment on the science to indicate the paper is outright fraudulent. You would have to be very sure before even *suggesting* such a thing in a public forum, since, if you’re wrong, the damage done can be disastrous and to some degree irreversible. Just look how Ben here, even after knowing the results were fraudulent, still hedged on the data themselves.
I actually do wonder if this would have been uncovered in peer review at an Econ journal. Without a domain expert, it would be difficult to detect many of these problems, and with emails from “Corning Research,” he could have gotten past concerns re: the company and his access to the data.
My guess is that he ended up being just a little too ambitious for his own good. These results were so initially compelling as to receive national coverage, opening up the audience to many subject-matter experts, at least some of whom likely shared the same concerns as Ben and Palgrave & therefore probably reached out to MIT.
Had he instead made more modest claims that didn’t raise so much attention, this likely would have slid past review without too much difficulty. This suggests there are probably many such fraudulent works out there, made by people who kept their results and claims below the “OMG” level, if you will, and who therefore didn’t receive the same level of acute scrutiny by many independent researchers.
great article
by the way, commenting on Arxiv papers is possible through alphaXiv, e.g.
https://www.alphaxiv.org/abs/2412.17866
Very well written, I had seen the results circulating and a few articles talking about it when first released.
I believe the reason the paper showed more validity in part was because Google’s work with antibiotics discovery was benefiting so carry over could be taking place. https://www.thebrighterside.news/post/google-ai-solves-a-decade-long-superbug-mystery-in-just-two-days/
> I also think that if comments were enabled on arxiv preprints, this could have led to a much more rapid conclusion to the fraud.
For machine learning, we actually have comments at hf.co/papers (https://huggingface.co/papers/date/2025-05-16) and since this just mirrors the arxiv, it could also be used for economics and other scientific domains :)