Forecast Aggregation: Sample Complexity and Peer-Assessment-Based Improvements

Many applications involve eliciting forecasts for a future event from a group of experts. How to aggregate the elicited forecasts into a single, accurate prediction for the event is the forecast aggregation problem. While appearing simple, forecast aggregation remains challenging. Interestingly, simple aggregators such as mean and median often outperform other more sophisticated data-driven aggregators in practice. In this talk, we’ll first approach the forecast aggregation problem from a theoretical perspective, attempting to understand how many samples are needed for a data-driven aggregator to produce an accurate forecast under a Bayesian model. We show that for general distributions, the sample complexity of forecast aggregation grows exponentially in the number of experts. But if experts’ signals are independent conditioned on the realization of the event, then the sample complexity of aggregation is significantly reduced and does not depend on the number of experts.

Our sample complexity results suggest that even for settings where historical data of expert performance are available, learning an optimal aggregator in general requires too many samples. Moreover, historical performance data often are not available in many forecast aggregation settings. In a second direction, we study the problem of aggregating forecasts without having historical performance data. We propose using peer prediction methods, a family of mechanisms initially designed to truthfully elicit private information in the absence of ground truth verification, to assess the expertise of forecasters, and then using this assessment to improve forecast aggregation. Our peer-prediction-aided aggregators were evaluated on a diverse collection of 14 human forecast datasets. Compared with a variety of existing aggregators, they achieved a significant and consistent improvement on aggregation accuracy.

Pizza will be served at 12:15 p.m.

Speaker Bio

Yiling Chen is a Gordon McKay Professor of Computer Science at Harvard John A. Paulson School of Engineering and Applied Sciences. She is a member of the EconCS and AI research groups, and a faculty affiliate of the Center for Research on Computation and Society (CRCS). Prior to Harvard, she spent about two years at Yahoo! Research in New York City. She obtained her Ph.D. from the College of Information Sciences and Technology at The Pennsylvania State University. She is a recipient of an NSF CAREER Award and The Penn State Alumni Association Early Career Award and was recognized by IEEE Intelligent Systems as one of AI’s 10 to Watch in 2011. Her research, situated at the interface between computer science and economics, lies in the emerging area of social computing, where human creativity and resources are harnessed for the purpose of computational tasks. She is interested in analyzing and designing social computing systems according to both computational and economic objectives. Her interests include information elicitation and aggregation, behavioral experiments, algorithmic game theory, machine learning, and multi-agent systems.