Carrying out an experiment demanding participation from a crowd over an entire year is not easy. It requires constant monitoring of resonant issues and the ability to dynamically adapt to the ebbs and flows of engagement. Some of the issues are easier to resolve than others. Below, we highlight five of the challenges we encountered and offer some suggestions for overcoming them.

Five lessons from running a crowd forecasting challenge

Sustaining engagement over longer periods is one of the key challenges for collective intelligence projects. This requires a keen awareness of participants’ motivations, which can change over time. While the overall engagement with crowd predictions challenge was very high, we tended to observe peaks of activity in response to media coverage, reminiscent of the activity patterns previously reported for citizen science projects that feature regular data releases. Sustained participation by individual forecasters over longer periods was much more rare and varied considerably between demographic groups or subject focus. For example, older forecasters typically participated in seven to eight questions on average, with frequent updates to their estimates and comments, while other groups were more piecemeal in their participation. These patterns of activity posed a challenge for subsequent analysis due to the need to correct for high variability. Strong media partnerships, tailored community management with regular personalised feedback and a mixed incentive approach that adapts to the changing motivations of the crowd can help to mitigate some of these effects.

At the outset of the challenge we had hoped to attract a diverse crowd and throughout we promoted the experiments as “open to everyone”. Overall, we covered a large range of ages and locations. However, we were disappointed by the gender balance and underrepresentation of the youngest age group (18-24 years). We tried to correct this imbalance by including a wider variety of forecasting questions, including those with a health focus, which had previously been shown to track well with BBC Future’s female audience. Despite this, the participation rates from women did not appear to follow a consistent pattern by topic area [1] and some of the Brexit-related questions actually attracted the highest proportions of female forecasting. However, overall both men and women participated more on non-Brexit questions than Brexit questions. More targeted media campaigns or varying the tone of communications about the challenge, away from jargon-y terms such as forecasting might have helped the challenge to resonate more with the groups who were less represented.

Any guide to experimentation will highlight the importance of specifying research questions and hypotheses in advance. But experiments beyond the lab can sometimes demand a more agile approach. Originally we planned for the experiments to provide an insight into the effect of gender-based teams on the accuracy of the crowd predictions. Our hypothesis was that communication and information sharing between team members would differ according to the gender balance of teams, based on previous research . This was a high risk design as it required consistent engagement and participation over longer periods, which ultimately proved too difficult so we were left with insufficient data. Despite this, we were able to make several comparisons between accuracy and forecasting behaviour of different demographic groups. Try to build redundancy into your research plan so that there are enough low hanging fruits to make up for the potential data gaps or failures of more complex experimental designs. For example, we still managed to compare the information sharing patterns of different groups by analysing forecasters’ commenting behaviours. While this was different to the original hypothesis we planned to test, it did reveal some interesting trends that could form the basis of future experiments. For example, we found that older participants tended to post more comments alongside their forecasts and that these were more likely to receive upvotes from other participants.

Forecasting is steeped in its own language - there are many terms to learn for new recruits as they learn the ropes, from Brier scores to Scope sensitivity. The phrasing of questions can also sometimes feel clunky and unintuitive. There is often good reason for this, as it helps to eliminate any residual ambiguity about what exactly is being asked. However, all of this combined can make the forecasting world inaccessible to the average participant. It’s important to be aware of these communication barriers. All wider communications, particularly the early messaging about the project that are crucial for driving initial recruitment, should focus on the relevance of the project for everyday lives and what the individuals taking part will gain from the experience.

In addition, forecasting requires a shift in the way that most of us approach thinking about the future. Crowd predictions encourage a more objective assessment of a situation through the lens of probability. This probabilistic approach, like much of the language around forecasting is unintuitive and can cause disengagement even after successful recruitment. Regular communication to participants that explains terminology and links to available training and tips can help.

Teams from large consultancies and individual experts are often the first to be asked for their forecasts when decision-makers face uncertainty. AI-driven predictive models and betting markets both offer alternative sources of insight into the future. Throughout our experiments we struggled to find meaningful comparisons for the crowd forecasts. Recruiting experts who would be willing to take on the crowd was especially difficult. We suspect this was primarily driven by the potential reputational damage of being outperformed by the crowd.

In contrast, the comparison with betting markets was often made difficult by the esoteric framing of our questions. In a few cases, we were able to compare our questions with the prediction market Smarkets to demonstrate that our forecasters were as good as those with “skin in the game”. Part of proving the complementary value of crowd predictions lies in comparing the method with other ways of anticipating the likelihood of future events. This could have helped us to identify the circumstances and question types where crowd forecasting can bring additional value to complement other ways of thinking about the future. After all, none of the methods is enough in isolation.

How to set questions for a crowd forecasting challenge

Selecting the topics and framing for forecasting questions is a mix of an art and a science. It requires a mix of creativity and precision to capture imagination while still ensuring that the result can be verified by a trusted source within the chosen timeframe. We were lucky to receive a lot of support from Good Judgment Open throughout the challenge and we still managed to stumble along the way! As more policymakers, government agencies and companies worldwide become interested in harnessing the power of the crowd to make forecasts about the future, we give our top tips for getting the questions right.

  • Have a clear means of verification
    Identify and make clear the sources that you will use for verification from the outset. This is important for reducing ambiguity when the final outcome is being resolved. Try to use recognised official data sources where possible and check how frequently they release their data so you can plan when to close your question accordingly. Crowd predictions are at their best when they can be verified quickly, so avoid anything that requires data releases more than one month after the question closes. This provides valuable real-time feedback to both the challenge holders and the individuals taking part, allowing them to adapt the challenge or method as necessary.
  • Set questions with a variety of deadlines and be topical
    Timing is everything, especially in the current information and media ecosystems, where the pace of change makes many issues outdated almost as soon as they appear. Try to introduce a variety of questions that play with these dynamics. It can be easier to sustain engagement with highly topical or shorter duration questions that follow the news cycle. Be mindful of scoping effects if your questions ask about the likelihood of a particular event occurring in a given time window. The likelihood naturally decreases over time so consider choosing a closing date that is much earlier than the date when the outcome will become known or remember to disregard the final period of forecasting to remedy against this.
  • Phrase questions simply and limit outcome choices to boost participation
    Make sure your question is phrased as simply as possible and reduce the number of possible outcomes the forecasters have to consider. Questions that are too complex can be a barrier to participation. Look to other sources for inspiration, such as prediction markets or surveys. This will also help you to make meaningful comparisons of contexts where crowd predictions converge or diverge from other forecasting methods.
social cards

What’s next for Nesta

Nesta is continuing to experiment with collective intelligence methods ourselves and supporting others through our grants programme.

We would love to hear any reflections from those of you who took part or have been following the challenge from afar. Let us know by writing to [email protected], using the subject line Crowd Results. And for anyone who missed the chance, there are many ongoing forecasting challenges on platforms like Good Judgment Open and Metaculus for you to start honing your forecasting skills. Remember, 85 percent of our participants had no previous forecasting experience and they managed to get it right 70% of the time!

[1] Overall, the proportion of female forecasters varied between 18% and 40%, depending on the question.