We Carried Out the Red/Blue Button Experiment with Life or Death Stakes
Results from an n=1000 study
Motivation
All that anyone wants to talk about these days is the red/blue button thought experiment. Unfortunately, a twitter poll isn’t a great proxy for life or death stakes. As many observers have noted, it’s easy enough to press the blue button when there’s nothing on the line, but we don’t know what would happen if this were actually performed, in real life, with terrifying outcomes.
It’s easy to dismiss this thought experiment as irrelevant to the largest challenges facing humanity, but with the rise of powerful AI systems1 and drone warfare,2 this scenario could be implemented by a totalitarian government or a rogue actor with devastating consequences. Thus, we received a lightning grant from a private funding agency to quickly carry out this experiment. A peer-reviewed publication is forthcoming, but we received permission to disseminate our findings on preprint servers and here on Substack.
Working with researchers from Johns Hopkins University, Cold Spring Harbor Laboratory, and the National Institutes of Health, we were able to rapidly conduct the Red/Blue Button Experiment (RBBE) under real world conditions.
In mice of course.
This presented several challenges. First, mice exhibit strong preferences for blue over red, since they lack red-sensitive cones.3 As mice are functionally red-green colorblind, we simply substituted red for green. Henceforth, when “red” is used in this report, you can be sure we are referring to “green” operationally.
Second, there’s evidence that mice are unable to comprehend concepts such as fractions or irreversible decisions.4 Our experimental design, which will be explained in greater detail later in this report, attempts to overcome this, utilizing best practices suggested by leading animal testing researchers from domestic and foreign institutions.5
Third, there are ethical concerns6 with the euthanasia of high volumes of mice; our experimental design utilized a sample size of n=1000, meaning that up to 499 mice could potentially require culling. To alleviate these concerns, we worked with mice taken from recently-concluded experiments.
Fourth, there are conspiracy theorists that claim that results in mice don’t always carry over for humans, particularly in the social sciences. We think this is misguided; mice are great proxies for humans and the consensus of the scientific community is strong in this regard.7 Mouse models are essential tools for the modern researcher.
Methods
Animals: Adult C57BL/6J mice (Jackson Laboratory, n = 1000, 500 male / 500 female, ages 2–22 months) were obtained from the surplus animal pool of recently-concluded protocols at the participating institutions and were due for euthanasia within 48 hours. Mice were group-housed (4 per cage) under a 12:12 light:dark cycle at 22 ± 1°C and 40–60% relative humidity, with ad libitum access to standard chow and water until 6 hours prior to testing.
Apparatus: An eight foot tall, ten-layered plexiglass enclosure, with space for 100 mice in each stratum, was constructed within a temperature and humidity-controlled warehouse. The mice were arranged in individual, 10” wide chambers, laid out on each layer in a hollow cylinder. The floors and ceiling of each layer were opaque, as were the walls between chambers and the outer walls. The inner walls were transparent and overlooked a large central chamber, visible to all the mice.
Demonstration Phase: Within this central chamber, for six hours, repeated demonstrations were held using a set of twelve mice. These mice were each provided with the option to proceed through a blue or red tunnel by depressing one of two colored levers and remaining in place for several seconds. This would trigger the opening of a small door. Each of the mice would proceed down the tunnel, where they would be separated into the group selecting blue and the group selecting red.
In situations where a majority of the mice selected blue, all the mice would be released into a tertiary enclosure with water and food. In situations where a minority of the mice selected blue, the mice that went through the red tunnel would be released into the tertiary enclosure and the mice that selected the blue tunnel would undergo a simulated euthanasia by guillotine, where researchers would, in full view of the other mice, inject the mice with a paralyzing anesthetic and use a rodent guillotine contrived to contain fake blood and plastic mouse-heads.8
This simulated experiment was repeated, using fresh sets of dozen mice, for six hours, for a total of eight completed runs. In three of these runs, the blue-selecting mice constituted a minority and underwent a simulated execution, and in the other five runs (including one 6-6 tie), all the mice survived. Researchers ensured a variety of outcomes by using mice that had been previously trained to select either blue or red levers through operant conditioning with reward cocaine.9
Mice in the surrounding enclosures were encouraged to observe the simulated experiments through the release of pleasant odors in the vicinity of the transparent inner walls.
Test Phase: Finally, the outer walls behind the mice were lifted simultaneously to reveal an identical experimental setup to the one they had observed: red and blue levers that could be depressed to open a tunnel. To prevent mice from viewing the decisions of mice on the other sides of the enclosure, a powerful light source in the center was inserted to illuminate the outer walls of each enclosure and to make it impossible for the mice to practically see the selections of other mice. Unbeknownst to the mice — but likely assumed on the basis of the simulated executions — researchers were standing by with anesthetic and a rodent guillotine to conduct euthanasia on the mice that would enter the blue chamber, were they to constitute a minority.
Results and Discussion
Of the 1000 mice, 512 selected blue and 486 selected red. Two mice became ill or deceased during the six hour observational window and were thus unable to depress the levers in a timely manner, excluding them from the experiment. While researchers were unable to determine cause of illness explicitly, heart failure or decision paralysis were deemed likely. Due to the blue majority, all mice were thus spared for the short term (at least 24 hours following the conclusion of the study) and provisioned with standard chow and water.
These results (see Figure 1) suggest that under life or death conditions, the RBBE results in very close finishes, with this experiment nearly resulting in an outcome that human observers would be likely to describe as “catastrophic.” Indeed, blue reaching 50% was not statistically significant from a binomial test, indicating that blue may fail to receive 50% of votes a significant percentage of the time.
There were several interesting findings in addition to these topline results.
We found no significant differences across sex.10 50.8% of male mice and 51.6% of female mice selected blue (see Figure 2).
Individuals that selected red did so significantly faster than those that selected blue (185 seconds vs 237 seconds). We believe this indicates that individuals who selected blue may have deliberated more intensely over their decision, whereas those who selected red had made up their mind in advance (see Figure 3).
Older mice were more likely to select red. For mice 0-6 months, 7-12 months, 13-18 months, and >18 months, their likelihood to selected red was, respectively, 42.8%, 49.3%, 52.0%, and 56.1% (see Figure 4).11 Based on online commentary, this could imply that older individuals were more likely to have more cynical views of their fellow mouse.
While we stand by our work, this is just a single study and there are several important caveats with which the public should interpret this work.
This study was conducted in mice and in line with previous findings on success at replicating mouse studies in humans, there is a possibility that these results would fail to replicate.12 We encourage researchers to attempt these experiments with consenting and informed adult humans, following the guidance of review boards.13
We do not know how mouse perception of color might affect the results. While in humans, red conjures up symbology of blood, clay, love, and Republicans, in mice it’s unknown whether these associations exist, especially given the substitution of green.
Finally, while for mice, who do not share the capacity for language, it was necessary to hold repeated demonstrations to illustrate the outcomes of their collective button selections, humans would simply be provided with a text explanation in the original RBBE design. This could result in significant divergence from the mouse model detailed here.
Author Contributions
Conceptualization: [BS]. Methodology: [BS]. Investigation: [BS]. Formal analysis: [BS]. Visualization: [BS]. Writing — original draft: [BS]. Writing — review & editing: [BS]. That is to say, it’s 100% BS.
For legal purposes, I’d like to clearly state that this is a work of fiction. I’d also like to thank Claude for being willing to generate stupid figures from my fake data and suggest contrived citations to lend an air of credibility to… whatever this is.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Scharre, P. (2018). Army of None: Autonomous Weapons and the Future of War. W. W. Norton & Company.
Jacobs, G. H., Williams, G. A., & Fenwick, J. A. (2004). Influence of cone pigment coexpression on spectral sensitivity and color vision in the mouse. Vision Research, 44(14), 1615–1622. https://doi.org/10.1016/j.visres.2004.02.008
Brannon, E. M., & Roitman, J. D. (2003). Nonverbal representations of time and number in animals and human infants. In Meck, W. H. (Ed.), Functional and Neural Mechanisms of Interval Timing (pp. 143–182). CRC Press. https://doi.org/10.1201/9780203009574
Percie du Sert, N., Hurst, V., Ahluwalia, A., Alam, S., Avey, M. T., Baker, M., et al. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLoS Biology, 18(7), e3000410. https://doi.org/10.1371/journal.pbio.3000410
Russell, W. M. S., & Burch, R. L. (1959). The Principles of Humane Experimental Technique. Methuen, London. (The 3Rs: Replacement, Reduction, Refinement.)
Perlman, R. L. (2016). Mouse models of human disease: An evolutionary perspective. Evolution, Medicine, and Public Health, 2016(1), 170–176. https://doi.org/10.1093/emph/eow014
Carlier, P., & Jamon, M. (2006). Observational learning in C57BL/6j mice. Behavioural Brain Research, 174(1), 125–131. https://doi.org/10.1016/j.bbr.2006.07.014
Skinner, B. F. (1938). The Behavior of Organisms: An Experimental Analysis. Appleton-Century. (Foundational; period flavour.)
Beery, A. K., & Zucker, I. (2011). Sex bias in neuroscience and biomedical research. Neuroscience & Biobehavioral Reviews, 35(3), 565–572. https://doi.org/10.1016/j.neubiorev.2010.07.002
Shoji, H., Takao, K., Hattori, S., & Miyakawa, T. (2016). Age-related changes in behavior in C57BL/6J mice from young adulthood to middle age. Molecular Brain, 9, 11. https://doi.org/10.1186/s13041-016-0191-9
Bracken, M. B. (2009). Why animal studies are often poor predictors of human reactions to exposure. Journal of the Royal Society of Medicine, 102(3), 120–122. https://doi.org/10.1258/jrsm.2008.08k033
Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67(4), 371–378. https://doi.org/10.1037/h0040525






That could almost be “Squid Game for Mice”
😂