If you know me, it’s no secret that I love competitive forecasting. For the uninitiated, competitive forecasting is exactly what it sounds like. People register their predictions on the future outcomes of events, generally expressed as a probability between 1 and 99%, and they are scored based on their accuracy at the end of a set period of time. In 2023, I decided to subject the rest of my graduate school department to this hobby of mine, and I organized a prediction contest which about 30 of my coworkers participated in. For our department’s contest, I created a set of 38 questions with a time frame of October 1st, 2023 to September 30th, 2024. About ½ of these questions were on general global and domestic affairs, ¼ were relating to scientific topics that I anticipated members of my department might be interested in, and ¼ were explicitly about my university or department itself.
All in all, I’d recommend the process, and would encourage others to try it in their workplaces or communities. Registering your predictions and seeing how they turned out—even as a brief exercise—is a really useful method for improving how you think about the likelihood of events occurring. It can also help provide a sense of control over the world: if you can’t directly affect the outcome of large events that may shape your life, you can at least control your own ability to anticipate and mentally prepare for them. Also, it’s entertaining in much the same way as a March Madness pool.
The most external benefit of a forecasting contest is that it produces very accurate forecasts through the wisdom of crowds. Our department’s aggregate prediction (the median response on each question) was quite accurate, with the worst misses being a 25% prediction that resolved YES (whether the WHO would declare a new global health emergency) and a 70% prediction that resolved NO (whether Joe Biden would be the favorite to win the election). If anything, that implies that the department was slightly under-confident in its predictions. I entered ChatGPT in this contest as well (GPT-3.5, not even the more recent models that are better at reasoning or endowed with internet search capabilities), and it placed 6th, outperforming the vast majority of my department, despite not knowing anything about the internal affairs of my university that were relevant to many of the questions. If my own personal predictions had counted towards the contest, I would have narrowly placed 2nd (kudos to my coworker who beat me, he should try out competitive forecasting)!
The people who scored towards the bottom in the contest were more likely to be dramatically overconfident in their forecasts, assigning 1% or 99% to a large fraction of the questions. If you’re making a prediction that something will happen with 99% likelihood, that’s analogous to being willing to stake $99 to win $1 that something will happen. All of these people whiffed on at least a couple of these predictions, and this hurt their scores a lot.
As an aside, this is a very common mistake that smart/famous/powerful people make all the time! Many people are constantly overconfident and overprecise in their predictions, and this can lead to a disconnect between punditry and reality. When you see people or institutions repeatedly making confidently incorrect predictions, you should adjust how seriously you weigh their perspectives in the future. On the flipside, some pundits are conscious of this and they will (1) only predict on questions that they have a very high degree of confidence in, or (2) only make easily disavowed or ironic predictions, so that they can move the goalposts after their predictions don’t bear out. I think that we should want the pundits and prognosticators we rely on to continue to make public and accountable predictions, not hide behind vagueness and ideological frameworks that can be adjusted post hoc to make them seem accurate. This indeed means that they may be wrong sometimes, and that’s okay!
The median performed more accurately than all but two of the predictors in the contest, and could have performed even better on a larger sample size or if I had decided to aggregate the predictions using a more aggressive methodology (perhaps discounting participants who made consistently extreme or outlier predictions). If you’re interested, here is the list of the questions that were part of the contest:
One issue with most competitive forecasting platforms is that they, by necessity, gravitate to more boring questions. Due to constant disputes over the gray areas in resolution, as well as time constraints, questions need to be defined very explicitly and sometimes need to be annulled. For example, in my contest, I had to annul a question as the release date of one of the films it was asking about was delayed until after the period of the contest. Thus, for competitions, it’s often best to ask questions that have very explicit data sources that can be used for resolution. Rather than asking “Will the US birthrate go up in 2025?” you might want to ask something like, “Will the number of babies born in Cook County in August 2025 be more than the number of babies born in August 2024 according to the Cook County hospital register’s monthly data?” The tradeoff of having a very explicit resolution source is that rather than forecasting deep questions about the world, you end up forecasting very particular trends in isolated data sources.
In a bespoke prediction contest like the one I ran, you can avoid this dilemma. As the dictator of my own contest, I can adjudicate subjectively on questions like “Was there a substantial breakthrough in quantum computing?” This sort of question would likely not fly in a more serious, competitive format, but it’s a much more interesting topic to forecast on. I hope to go even further down this path in the future, for the kind of people that like to guess the number of jellybeans in a jar.
A few other contests I’d like to run in this vein:
A Keynesian beauty contest where each participant has to guess the most popular response in a variety of categories (animal, number, city, movie, etc) and whoever gets the most correct wins.
Each participant submits 10 names at the beginning of the year. At the end of the year, I pick my 10 “People of the Year”, and whoever has the most overlap wins.
Forecasting tournament where you are forced to operate on incomplete information because each of the questions will have an important component redacted. The questions might be “Will a war break out in _____?” or “Will there be more than __ executive orders in 2025?” This may sound impossible but I think it would be a really useful exercise in predicting both the underlying outcome and the nature of the question itself, which is something we actually do all the time in the real world!
Let me know if there’s any interest in participating in a contest like these in the comments!
Sounds like fun! Sign me up.
I’d be down to participate!