AI is best when used as a tool to help you think, instead of doing the thinking for you. I only started using AI in March, and it has helped me organize and work through my ideas immensely. I think best through conversation. People get bored, or aren’t in the right head space to listen to me go on and on about something I want to work out. AI “converses” with me until I’m the one getting tired. That is the value of this tool for me. I think some are just framing the technology incorrectly.
You're being quite generous! Sounds like this paper is heavily p-hacked, has all the usual flaws of pop psychology, cannot prove anything interesting by design and is written by people with a clear agenda. Just another day in academia!
So they found that using a LLM to write reduces effort, and possibly that typing is mentally effort-intensive. I’d be curious to see how typing prewritten content compares to typing an original essay.
Why is no one talking about the amount of psychological priming happening here? At first I was confused at the seemingly similar treatment participants received regardless if they were in an undergraduate program for computer science, and rarely engaged in long-form writing, or someone who has been working on a dissertation, when writing is their whole lives right now.
I was curious about why they didn't just assign prompts randomly so as not to introduce bias as their minds worked through their tasks. When I saw the prompts they review prior to engaging in the work, I was frankly a little shocked.
Look at the list and tell me whether or reviewing any of them might cause you to reflect on the academic integrity violation you are about to commit in the name of science. I could spot more than a few.
1. Where loyalty has its limits (including loyalty to institutions)
2. Achieving individual success without helping others
3. Potential negative effects of having too much free time to explore interests
4. Thinking before speaking (honestly this one is a little on the nose for me)
5. Whether billionaires have any philanthropic duty
6. Advocating that art has limited value in society
7. Embracing one's flaws and receiving acceptance from those around them
8. Losing one's identity and freedom to achieve utopia
9. Penalizing those who work hard and show enthusiasm
Did no one think about the potential for a participant's mind, who is an academic, writing an academic essay, while sitting in an academic institution, if they were asked to use the most widely-known culprit involved in student conduct violations? Apparently not, because they then had them choose whether the violation should be about an artistic industry OpenAI is threatening, whether billionaires like OpenAI's CEO and funders have moral obligations to the rest of us, or the social benefit of courageously accepting their flaws and limitations (an intrinsic benefit of foregoing AI usage in academic work?)
The decision to move ahead with the above would not be as impactful on the 4th session though, if Dr. Kosmyna's team did not intentionally pause prior to having groups switch conditions, and ask participants to reflect on why they made their choice, whether they could take ownership of AI-generated work, and satisfied with doing so. If that couldn't create noise in the ChatGPT group, having the other conditions reflect on the fact that they had not been, and were about to, engage in doing so either.
I could not believe that an MIT team had not considered any of this, or at least discussed it in the limitations of this work. But then I came to the seemingly random inclusion of information related to power usage, which was so far outside of what I've ever seen in a study, and it clicked. If you were trying to generate results that showed lower brain activity during these cheating studies, maximizing the potential to disengage with the activity itself before participants even began would be pretty clever.
It was so clever that seemingly every news outlet in the world decided to write about this, with nary a mention that none of this has passed peer review and shouldn't be treated as so. I mean how did the MIT media lab laude this work so heavily. I click on their search result expecting something to address the clear controversy here.
This really pisses me off because I am not an advocate for how we are currently developing AI tools or the people that own them, but if that's a unavoidable to address the other serious flaws I haven't seen acknowledged, whatever.
Oh interesting thing to notice! I hadn’t thought much of it. Tbh, I do think it’s most likely unintentional. If there’s any priming in these questions, it was probably subconscious because the researchers writing these questions were also the researchers on this study (ie, these are just the first anodyne essay topics that came to mind for them).
So a day later, and I am a little less heated, I'll totally admit that there is an equally likely possibility that they made an honest, albeit sloppy, mistake. Motivated reasoning is a powerful drug and I've had some time to reconsider since my first comment. I'm also approaching this as someone active in psychology research, so that could be why the problem jumped out so quickly. Most of my beef with this paper is the media coverage of it, and that's very much not the fault of the team behind it.
I developed another concern, related to my comment about individual differences, over the last day - natural differences in neural activity. It was odd to me that they elected to imply that an LLM is so impactful on reductions in neural activity to return to baseline. A 5th and 6th session would have told us more, but accounting for the control group having elevated neural activity in the 4th session of the study felt pretty bold.
For the two hypothetical participants I mentioned above, there are documented reasons related to age, general ability, aptitude/interest related to the essay task, relative experience with academic writing in general, etc. that could have influenced the persistence of more activity in the control group during the 4th session. They noted some limitations regarding a homogenous pool of participants, which I respect, but didn't necessarily see as a matter of geographic constraint like they indicated.
My hope is that they retained all the info for the people who made it to session 4 and that they combine it with more info prior to publishing. I shouldn't be as frustrated as I was, in general or in public blog comments, but it's a perfect storm of methodology concerns, my subject emphasis, MIT's good name attracting the media, and Dr. Kosmyna agreeing to CNN interviews prior to academic review.
It's not out of passionate advocacy for AI-enabled academic work. I share the authors' concerns and unfortunately have to review a lot of student work that includes academic integrity issues. They're correct that making correct decisions related to this topic is a critical concern, so the last thing I want to do is give companies like OpenAI an opportunity to both damage MIT's credibility and make themselves look like a victim in any capacity.
Great article looking into all of these details. I haven't looked into it nearly as deeply, but yeah, I don't think you can draw much from this paper that wasn't common sense, and definitely not the dramatic claims some are making.
I feel like LLMs have hugely increased my ability to learn and understand too, but I also think it is something to be careful about since it can be hard to judge that for ourselves. I think having some regular deep thinking time not using LLMs is probably a good safety measure (I think chess is perfect for this luckily!). But totally agree - it's a very powerful tool which can be used for either good or ill, and that's as true in education as anywhere else.
I wonder if AI making it so easy to cheat will force schools to realign themselves to more useful/engaging because students will just cheat otherwise. It's a hope at least!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
Good write-up. I haven't read the paper, so take this with a grain of salt...
I report EEGs as part of my job, and I have lectured on EEG basics to medical students, but I do not consider myself an expert. From my partial knowledge, I am deeply suspicious about any conclusions drawn on the cognitive value of frequency correlations across brain regions.
EEGs do not pick up the activities of single neurons; they pick up synchronised activity in large populations. Neurons engaged in piecemeal information processing are unlikely to be doing the Mexican wave with other neurons; they have more important stuff to do.
For instance, the dominant rhythm from the occipital lobes, where vision is processed, is usually known as the alpha rhythm. It is usually prominent in a relaxed subject whose eyes are closed, and it abates when the eyes are open. That is, it is a negative indicator of actual visual cognition. I think of it as what the occipital lobes do when they are idling. If some other region happened to have activity at the same frequency, that would not prove that clever visual cognition was taking place, or even that the eyes were open.
Maybe frequency correlations tell us something about binding... But it's still more like reading tea leaves than doing science.
I remember that stadium metaphor, too; it is spot on.
Not only does this study tells us nothing much about any residual cognitive effects, with a design committed to task-specific effects, it is not at all clear to me that the authors can judge the inherent cognitive value of the different EEG properties being watched. The subjects were engaged in different activities, as you note, so their EEGs were different, but that's about all that can be said.
The whole exercise of drawing conclusions about the risks of using AI from such an exercise smacks of pseudoscience (probably with a large measure of p-hacking thrown in). At best, this sort of work could generate hypotheses; from your description it cast the statistical net too wide to prove any individual hypothesis.
That the study has generated excitement in the lay press is very typical. Few people seem interested in showing the necessary level of scepticism, so ambiguous results are channelled uncritically into science-themed clickbait.
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
Great article but I think you are way too polite to draw the only conclusion that the data permits: the results are statistical noise rendered as narrative.
When you run thousands of significance tests across EEG channels without strict pre-registration or strong correction (such as FWER), your findings are not discoveries — they are artifacts of fishing. FDR correction allows for false positives by design. If your experiment involves 1024 electrode pairs and runs up to 1000 rmANOVAs per session, you're not uncovering neural dynamics. This biased and compromised. Publishing this kind of result borders on misconduct. If we cannot distinguish signal from noise, we should not pretend we have anything other than speculation. That's not enough for science.
AI is best when used as a tool to help you think, instead of doing the thinking for you. I only started using AI in March, and it has helped me organize and work through my ideas immensely. I think best through conversation. People get bored, or aren’t in the right head space to listen to me go on and on about something I want to work out. AI “converses” with me until I’m the one getting tired. That is the value of this tool for me. I think some are just framing the technology incorrectly.
You're being quite generous! Sounds like this paper is heavily p-hacked, has all the usual flaws of pop psychology, cannot prove anything interesting by design and is written by people with a clear agenda. Just another day in academia!
So they found that using a LLM to write reduces effort, and possibly that typing is mentally effort-intensive. I’d be curious to see how typing prewritten content compares to typing an original essay.
Why is no one talking about the amount of psychological priming happening here? At first I was confused at the seemingly similar treatment participants received regardless if they were in an undergraduate program for computer science, and rarely engaged in long-form writing, or someone who has been working on a dissertation, when writing is their whole lives right now.
I was curious about why they didn't just assign prompts randomly so as not to introduce bias as their minds worked through their tasks. When I saw the prompts they review prior to engaging in the work, I was frankly a little shocked.
Look at the list and tell me whether or reviewing any of them might cause you to reflect on the academic integrity violation you are about to commit in the name of science. I could spot more than a few.
1. Where loyalty has its limits (including loyalty to institutions)
2. Achieving individual success without helping others
3. Potential negative effects of having too much free time to explore interests
4. Thinking before speaking (honestly this one is a little on the nose for me)
5. Whether billionaires have any philanthropic duty
6. Advocating that art has limited value in society
7. Embracing one's flaws and receiving acceptance from those around them
8. Losing one's identity and freedom to achieve utopia
9. Penalizing those who work hard and show enthusiasm
Did no one think about the potential for a participant's mind, who is an academic, writing an academic essay, while sitting in an academic institution, if they were asked to use the most widely-known culprit involved in student conduct violations? Apparently not, because they then had them choose whether the violation should be about an artistic industry OpenAI is threatening, whether billionaires like OpenAI's CEO and funders have moral obligations to the rest of us, or the social benefit of courageously accepting their flaws and limitations (an intrinsic benefit of foregoing AI usage in academic work?)
The decision to move ahead with the above would not be as impactful on the 4th session though, if Dr. Kosmyna's team did not intentionally pause prior to having groups switch conditions, and ask participants to reflect on why they made their choice, whether they could take ownership of AI-generated work, and satisfied with doing so. If that couldn't create noise in the ChatGPT group, having the other conditions reflect on the fact that they had not been, and were about to, engage in doing so either.
I could not believe that an MIT team had not considered any of this, or at least discussed it in the limitations of this work. But then I came to the seemingly random inclusion of information related to power usage, which was so far outside of what I've ever seen in a study, and it clicked. If you were trying to generate results that showed lower brain activity during these cheating studies, maximizing the potential to disengage with the activity itself before participants even began would be pretty clever.
It was so clever that seemingly every news outlet in the world decided to write about this, with nary a mention that none of this has passed peer review and shouldn't be treated as so. I mean how did the MIT media lab laude this work so heavily. I click on their search result expecting something to address the clear controversy here.
This really pisses me off because I am not an advocate for how we are currently developing AI tools or the people that own them, but if that's a unavoidable to address the other serious flaws I haven't seen acknowledged, whatever.
Oh interesting thing to notice! I hadn’t thought much of it. Tbh, I do think it’s most likely unintentional. If there’s any priming in these questions, it was probably subconscious because the researchers writing these questions were also the researchers on this study (ie, these are just the first anodyne essay topics that came to mind for them).
So a day later, and I am a little less heated, I'll totally admit that there is an equally likely possibility that they made an honest, albeit sloppy, mistake. Motivated reasoning is a powerful drug and I've had some time to reconsider since my first comment. I'm also approaching this as someone active in psychology research, so that could be why the problem jumped out so quickly. Most of my beef with this paper is the media coverage of it, and that's very much not the fault of the team behind it.
I developed another concern, related to my comment about individual differences, over the last day - natural differences in neural activity. It was odd to me that they elected to imply that an LLM is so impactful on reductions in neural activity to return to baseline. A 5th and 6th session would have told us more, but accounting for the control group having elevated neural activity in the 4th session of the study felt pretty bold.
For the two hypothetical participants I mentioned above, there are documented reasons related to age, general ability, aptitude/interest related to the essay task, relative experience with academic writing in general, etc. that could have influenced the persistence of more activity in the control group during the 4th session. They noted some limitations regarding a homogenous pool of participants, which I respect, but didn't necessarily see as a matter of geographic constraint like they indicated.
My hope is that they retained all the info for the people who made it to session 4 and that they combine it with more info prior to publishing. I shouldn't be as frustrated as I was, in general or in public blog comments, but it's a perfect storm of methodology concerns, my subject emphasis, MIT's good name attracting the media, and Dr. Kosmyna agreeing to CNN interviews prior to academic review.
It's not out of passionate advocacy for AI-enabled academic work. I share the authors' concerns and unfortunately have to review a lot of student work that includes academic integrity issues. They're correct that making correct decisions related to this topic is a critical concern, so the last thing I want to do is give companies like OpenAI an opportunity to both damage MIT's credibility and make themselves look like a victim in any capacity.
Great article looking into all of these details. I haven't looked into it nearly as deeply, but yeah, I don't think you can draw much from this paper that wasn't common sense, and definitely not the dramatic claims some are making.
I feel like LLMs have hugely increased my ability to learn and understand too, but I also think it is something to be careful about since it can be hard to judge that for ourselves. I think having some regular deep thinking time not using LLMs is probably a good safety measure (I think chess is perfect for this luckily!). But totally agree - it's a very powerful tool which can be used for either good or ill, and that's as true in education as anywhere else.
I wonder if AI making it so easy to cheat will force schools to realign themselves to more useful/engaging because students will just cheat otherwise. It's a hope at least!
New here, via Trung. Ty!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
Good write-up. I haven't read the paper, so take this with a grain of salt...
I report EEGs as part of my job, and I have lectured on EEG basics to medical students, but I do not consider myself an expert. From my partial knowledge, I am deeply suspicious about any conclusions drawn on the cognitive value of frequency correlations across brain regions.
EEGs do not pick up the activities of single neurons; they pick up synchronised activity in large populations. Neurons engaged in piecemeal information processing are unlikely to be doing the Mexican wave with other neurons; they have more important stuff to do.
For instance, the dominant rhythm from the occipital lobes, where vision is processed, is usually known as the alpha rhythm. It is usually prominent in a relaxed subject whose eyes are closed, and it abates when the eyes are open. That is, it is a negative indicator of actual visual cognition. I think of it as what the occipital lobes do when they are idling. If some other region happened to have activity at the same frequency, that would not prove that clever visual cognition was taking place, or even that the eyes were open.
Maybe frequency correlations tell us something about binding... But it's still more like reading tea leaves than doing science.
I remember that stadium metaphor, too; it is spot on.
Not only does this study tells us nothing much about any residual cognitive effects, with a design committed to task-specific effects, it is not at all clear to me that the authors can judge the inherent cognitive value of the different EEG properties being watched. The subjects were engaged in different activities, as you note, so their EEGs were different, but that's about all that can be said.
The whole exercise of drawing conclusions about the risks of using AI from such an exercise smacks of pseudoscience (probably with a large measure of p-hacking thrown in). At best, this sort of work could generate hypotheses; from your description it cast the statistical net too wide to prove any individual hypothesis.
That the study has generated excitement in the lay press is very typical. Few people seem interested in showing the necessary level of scepticism, so ambiguous results are channelled uncritically into science-themed clickbait.
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
They need to incent the LLM users to write a 'good' paper. Segment the groups the same but the best paper gets $250, the top 3 $100. The rest just paid for their time.
I use LLMs to write legal memoranda, and I genuinely feel like they don't save me much time, but let me go deeper into more cases. I feel like I'm 'thinking' and engaging in a similar flow state as a non-LLM essay, rather than just getting the LLM to regurgitate. But I'm extra motivated to do good work because I have a boss reviewing it, and I'd be especially embarrassed if part of their feedback was 'it looks like an LLM did this.'
Replicating some sense of the motivation to write a quality essay is important!
So FWER analysis > FDR analysis > Bonferroni correction ? Also, how should one learn about these things?
Great article but I think you are way too polite to draw the only conclusion that the data permits: the results are statistical noise rendered as narrative.
When you run thousands of significance tests across EEG channels without strict pre-registration or strong correction (such as FWER), your findings are not discoveries — they are artifacts of fishing. FDR correction allows for false positives by design. If your experiment involves 1024 electrode pairs and runs up to 1000 rmANOVAs per session, you're not uncovering neural dynamics. This biased and compromised. Publishing this kind of result borders on misconduct. If we cannot distinguish signal from noise, we should not pretend we have anything other than speculation. That's not enough for science.