The Coming Apocalypse for Scientific Publishing
Price was right
1. Antikythera
Trouble is brewing in the world of science. But first, a bit of history.
Before the mid 20th century, scientific publishing was disseminated primarily by scholarly societies. There was a countable number of journals, and each published at some regular rate. If you wanted to stay abreast of every paper published in your field, you needed only to become a member of some society and diligently read their journal. As the venture of science expanded from the realm of learned gentlemen, the number of journals grew, and commensurate with this growth were pretty impressive advances in our understanding of the world.
By 1963, it was clear that the growth had gone exponential. Science historian Derek John de Solla Price wrote a book in that year on the growth of science, where he theorized that the volume of scientific literature doubles every 10-15 years. In the late 17th century, there had been 2 journals. By the start of the 19th century, there were 100. At the turn of the 20th century, thousands. Price reckoned that rather than exponential growth, a logistic fit must surely apply. He noted that it would be impossible for science to continue to grow at that rate much further, as there would soon need to be more scientists than the total number of humans in order to accommodate that. Lol. Lmao, even.
Price, funnily enough, was also known for his work on deciphering the Antikythera mechanism, an analog computer with perhaps the first claim—if you grant me some creative license—to being a non-human scientist. Now, we have a lot of them. Perhaps Price should have taken his exponential fit to its logical conclusion. It’s quite facile to now imagine a world with a few billion AI scientists in some data center.
Of course, we’ve already kept up that exponential growth until recently without having to rely on AI. This is because scientific publishing started to become a target in and of itself. Repeat after me: “When a measure becomes a target, it ceases to be a good measure.” In Price’s time, there were a couple hundred thousand papers published per year. Now, there are several million. Needless to say, one cannot read them all. Even with a very narrow definition of one’s field, it would be futile.
2. Discernment
Luckily, scientific publishing has developed increasingly complex and labor-intensive systems of publication requirements and peer review. Prestige hierarchies for journals have also emerged. This has had many unfortunate side effects, but it’s a necessity if you want to separate the chaff from the grain.
If you have any doubts about this state of affairs, I urge you to do the following. Take a subject that you have expertise in. Do a comprehensive search, such that you get literally every paper published on that topic over some window of time. Then, become slightly depressed at the sheer volume of wasted person-hours that went into that tranche of literature. Even most academics will rarely do this, nor should they. Generally, people look for papers from specific journals they know to be high quality or use Google Scholar searches that pull the best and most relevant papers to your attention. For my job I frequently get to do these kinds of exhaustive searches using a literature metadata aggregator. I don’t want to pick on individual papers because I think it’s mean and petty, but I want to give you a sense of what a typical search in a STEM field looks like. I would guess this extrapolates pretty well to most fields. Here’s a framework that will be helpful for later in this article:
C papers (grade inflation so C is as low as this scale goes): ~80% of the papers in a comprehensive, global search are just complete slop, on an objective level. I don’t mean this in a mean way. Many of these papers must surely exist solely to fulfill paper publishing requirements for, say, technical masters programs around the world. Some fraction are explicitly fraudulent or plagiarizing, not that it really matters. They’re published in something like “The Arabian Journal of Atmospheric Chemistry” or “Advances in Biochar Management.” They are often a crude amalgamation of multiple iterative subfields. The papers are essentially unreadable both in style and content, and the modal figure is a bar graph that looks like it was made on Excel in the 1990’s and then faxed to the journal.
A typical title would be something like (I’m not even exaggerating here, I’m paraphrasing the third best article out of ~50 that I just pulled in a particularly demoralizing lit search) “Utilization of tea leaf waste in cobalt nanoparticle catalysts and application for kale farming.” These papers generally don’t come up on your Google Scholar search because no one wants to see them. They are the Buzzfeed articles of science.
By the way, I want to be clear that I don’t think papers that sound funny or useless or cover whimsical or niche topics are bad! That’s not the issue here. I’d love a good paper on goat farming in Malaysia. I promise you, these papers are actually slop. Go and see for yourself!
B papers: Of the remaining 20%, a further 80% will seem to be of acceptable quality but unfortunately provide little-to-no value to the field. These papers probably “deserve” to have been researched and written, but for a litany of reasons, they’re just not really advancing science in a meaningful way. This is fine, of course. Science is a high-risk venture, and most ideas will fail, and it’s probably good to publish regardless. That being said, most of these papers lack a clear purpose. Many are almost completely duplicative of previous research. Some are ideological in ways that would appear cringe to the lay reader. Others seem to be obligatory as a result of some corporate-academic partnership.
The audience of these papers are generally very narrow domain experts, and I doubt they’ll learn anything they didn’t already know by reading them. Moreover, these papers often don’t hold up to more intense scrutiny. If you print out one of them and bring it to the office of an aging, wisened professor in your department, they will point out several critical methodological issues that make the article’s conclusions suspect.
A papers: The remaining ~4% of the papers are good-ish. For some fields, this could be more like 1%, but for smaller or newer fields (or those with high barrier to entry like particle physics), it might be much higher. These are generally published in high-impact journals or even just mid-tier field-specific journals that screen out complete slop. When you read these papers, you are at least confident that someone wrote them with a basic notion of what their scientific purpose was.
Most papers published in American R1 institutions, for a frame of reference, will fall into this category! It’s really not that high of a bar. Within this subset of the literature, there are still serious issues to reckon with: replication, scientific errors, motivated reasoning by researchers who have vested interests in their subfield showing progress, citation-maxxing, etc, etc etc. But at least these papers are things you would read and go “ah, yes, this here is an acceptable use of 2 years of PhD student labor” (again, a pretty low bar).
The fact that it’s still possible for most academics to differentiate the good papers from the bad is keeping afloat the whole system of scientific publishing.
If we lived in a world where there was, all of a sudden, no good way to discern the slop from well-researched and well-designed experiments, we’d be buried up to our necks in BS. Unfortunately…
3. Sloptimization
I wrote about the canary in the coal mine a few months ago. If you recall, an incredibly well-made article out of MIT, on AI enabling materials discovery at a fictional version of Corning, was revealed to be fraudulent. The data was fake, but the methodology, figures, and narrative were more than convincing to most. Some of the most famous economists in the world were fooled.
While I don’t think it’s been fully confirmed, an effort like that benefited tremendously from ample AI-assisted ideation, data generation, and figure creation. But on top of that, it required a steady-handed auteur, established in a top-tier academic program. And that got him a paper that was, until discovered to be fraudulent, firmly an A+ in my crude and artificial scale. But now, a year later, the barriers to entry for AI tools have dropped dramatically, and those tools are smarter than ever. And while the vast majority of MIT first-years might be wisely unwilling to avoid the scrutiny that a fraudulent paper in a top-tier journal would bring, I doubt that the entire academic world has that same aversion.
There are two main schools of thought on this. The first, is that scientific publishing will drown under a torrent of AI slop. Ross Anderson made this case pretty well in the Atlantic last month, and I think this is indeed a near-term concern. But it’s also easy to imagine systems where AI actually helps parse through higher volumes of papers faster than it produces them, and stable counter-measures emerge. More relevantly, it’s also easy to imagine that AI makes papers less like slop.
The second school of thought is that AI tooling will indeed help better researchers publish more and take on a more managerial role in conducting science. More higher-quality research will proliferate and we’ll all be better off for it. The recent article in Science from researchers at Cornell and Berkeley, “Scientific production in the era of large language models,” did not get nearly enough attention. They found that authors who started using LLM tools became measurably and substantially more productive.

In fact, this paper was so outstanding, using a broad array of methodologies to support several, intersecting claims, that I initially was quite worried it was fraudulent! It seemed too good to be true. I spent way too long trying to ineptly replicate their findings before becoming less worried about this, and I think it’s just a truly great piece of research. You should read it.
In dialectical fashion, I’m worried about a third thing. I’m concerned that AI will indeed improve most papers, but that this is bad actually. You want the best papers (the “A” papers) to become better. But is it actually in the interest of science for the following scenario to occur?
All papers that would have fallen into the “C” category of slop can, at this very moment, be trivially passed through an AI tool to raise them at least into the “B” category. In a single pass from an AI model, you can have your text be transformed to a level of writing that is probably higher than >95% of academic papers currently are. In a single prompt from an AI model, each of your figures can be beautifully rendered in python instead of whatever foul, antiquated scientific plotting software is on your 15-year-old lab computer.
With slightly more robust prompting, and a very conservative extrapolation of AI capabilities 1 year into the future, you can raise most papers in the “B” category to the level of “A” papers. Your figures and text will be great already, but now you can also utilize AI tools to brainstorm and execute one key modeling experiment at the bottom of the paper. The AI will identify and highlight any marginal components of novelty within the paper, touch up your tables with formatting fluorishes, identify citations to add from every potential reviewer, etc. This will take some effort and skill, but far less effort and skill than what normally goes into “A” papers to get them to that level.
There will also be way more papers, even in STEM fields, where the bulk of the work lies in the experimentation. What would have formerly taken a researcher 100 hours to write up and generate figures for, will now take 50 hours, or perhaps 5 for someone who is fine entrusting vast amounts of academic, cognitive labor to their chatbot.
Journals are already unprepared for distinguishing good papers from bad, and bad papers from fraud. I am now starting to encounter papers in my literature searches that appear to be largely generated by AI, but these papers are quite obviously crap, and are only competing within the slop category.
The people best positioned to take advantage of AI capabilities to dramatically improve their ability to generate convincing academic papers have thus far been tech-forward academics. These people already have a pretty good idea of what a convincing academic paper looks like, and how to use the cutting edge of generative capabilities to massage one. This doesn’t change the dynamics much in the field. If anything, maybe this is fine: you get a bunch of 10x academics who are able to spend more time ideating and less time grinding in R.
Over the next months and years, as AI technology rapidly disperses through the known world, this dynamic will change. It will not require clever prompting, a sophisticated understanding of what makes a journal article novel or good, or eventually, even domain-specific expertise. And word will get out. “Hey, I copy-pasted my draft into Claude Code and told it to optimize my paper for submission to the Journal of the American Chemical Society, and it just got past peer review!”
4. Vivisepulture
Doesn’t this all actually point out some fundamental flaws in academic publishing? If anyone can trivially generate a paper that is barely distinguishable from some late-career offering of a Nobel laureate these days, doesn’t that say something about how academic publishing has strayed from its original purpose in disseminating genuinely novel scientific findings? Even if AI never becomes “superintelligent” or capable of “real” scientific discovery, if it can write a paper that looks and sounds like it’s doing some serious science, then don’t we need to reevaluate the form and function of scientific publishing and review so that we can better judge these discoveries?
Yes. Of course.
It’s almost impossible to describe a solution for this slow-building crisis while avoiding a lengthy diatribe on a host of other issues with scientific publishing. Peer review must be overhauled, the incentives need to change around citations, the replication crisis should probably be addressed, and it’s far from ideal to have a system which paywalls the academic literature to the vast majority of readers.
I wish I could surgically describe what could be done about this precise issue, but it’s honestly not possible. Banning or limiting AI utilization is entirely impractical. It’s already trivial to pass your writing through an AI system that is designed to make the writing seem more human. AI-checking software is good for detecting student essays copy/pasted from ChatGPT. It’s useless against skilled academics working to fool them.
Journals also probably are unwilling to more heavily weight the credibility of the researcher or their institution than they already are now. It’s unsavory and runs contrary to the ethic of science.
Journals could also try to somehow blind reviewers to the quality of articles’ text or visual aspects, somehow rating the papers purely on the merit of their findings and/or methodology. But it’s highly unclear whether this would accomplish its goals, or just create some other bizarre metric for humans and AI to optimize towards. Plus, the AI will be rapidly superhuman at drafting a brief statement summarizing the merit of a paper’s findings, if it isn’t already.
There are some who might believe the only way out is through. AI-researcher collaborations (in the spiritually true sense of the term, not just AI tools like AlphaFold or whatever) are now yielding what appear to be genuine-ish breakthroughs in math and physics. Skepticism is warranted here, and I hope to put out a take on these breakthroughs soon, but you should be just as skeptical of the people confidently announcing that AIs are “stochastic parrots” and can never conduct true science due to some inherent limitation in the paradigm of machine learning.
There are also those who have bones to pick with academia, and are happy for AI slop to litigate these grudges on their behalf. I have a suspicion these folks will not like the academia that emerges on the other side.
Academia is famous for moving slow, certainly slower than the tech companies automating away their profession. It’s quite hard to imagine a world where scientific publishing adapts dramatically, using AI tooling to more rapidly screen, filter, anticipate, and respond to the coming onslaught of submissions, and developing better practices or philosophies for assessing the novelty and value of what they are publishing. But despite this, there’s truly nothing stopping Nature or Wiley or Elsevier from getting a move on it. Certainly, with Claude Code’s help, the software engineering work that would be required to build out these frameworks is not a barrier. They need only to press start.


>The second school of thought is that AI tooling will indeed help better researchers publish more and take on a more managerial role in conducting science. More higher-quality research will proliferate and we’ll all be better off for it.
I was/am hoping that AI will actually raise the bar for publication so high that the best teams will publish less, but more impact fully. With Claude CoLab or whatever, it is no longer necessary to publish on most ideas/experiments because the amount of time and effort necessary to evaluate and discard them essentially goes to zero. One could imagine an AI in the review loop that does exactly this and rejects work that is estimated to take less than, say, 6 weeks of AI time?
Do we lose the great ideas that were low-hanging or inexplicably ignored? Maybe, but surely Claude CoLab could also evaluate papers for novelty much more efficiently than any human.
Of course we could also retreat even more into scientific bubbles where interaction with the outside slop-sea of work is heavily filtered by multiple LLMs and precedence is given to the work of people you've met and trust?
An idea I've had and seen elsewhere is that journals should put limits on the amount of papers one can publish in a certain time frame. There may be ways to make this less of a hard limit, like having tiers; top tier you can only publish say once a year, reserved for only those papers you think are your best work, 2nd tier you can publish say 3 times a year for work that you think is good but maybe more run of the mill. Then anyone wanting to look at only the best work can filter their search based on tier.