The Three-Choice Revolution: How MIT Researchers Are Fixing 100 Years of Preference Prediction
MIT researchers have discovered a fundamental flaw in how preference prediction has worked for nearly a century, and the fix is surprisingly simple: ask people to rank three options instead of two. The finding could reshape how AI systems learn human preferences, from Netflix recommendations to large language model training.
Why Have Two-Choice Comparisons Failed to Capture What People Really Want?
Since 1927, when American psychologist L.L. Thurstone published "A Law of Comparative Judgment," scientists have relied on random utility models (RUMs) to predict human preferences. These mathematical frameworks assume that when people choose between options, they pick the one with the highest value to them, even if they cannot assign a precise number to that preference.
The problem is that RUMs have been trained almost exclusively on pairwise comparisons: Netflix asking "which movie do you prefer, A or B?" or Amazon asking "which product appeals to you more?" This approach persists because it is cognitively easier for people to compare two things than to assign a numerical rating like 4.37 to a single item. But this simplicity comes at a cost.
"With this way of assessing people's preferences, looking at just two things at a time, it is impossible to find correlations between the numerous choices," explained Constantinos Daskalakis, the Avanessians Professor of Computer Science at MIT.
Constantinos Daskalakis, Avanessians Professor of Computer Science at MIT
The MIT team, which included Yeshwanth Cherapanamjeri, Gabriele Farina, Daskalakis, and Sobhan Mohammadpour, proved mathematically that pairwise comparisons alone cannot reveal correlations between preferences. If someone favors gun control, they may also support government-sponsored child care. A fan of independent films might also enjoy foreign cinema but dislike Hollywood action blockbusters. These hidden connections matter enormously for recommendation systems.
What Changes When You Ask for Three-Item Rankings Instead?
The breakthrough is elegant: when large numbers of people rank three alternatives in order of preference, correlations suddenly become visible. The same information can also be extracted from a combination of best-of-three and best-of-two choices. The researchers presented their findings in April 2026 at the International Conference on Learning Representations in Rio de Janeiro.
Sobhan Mohammadpour, an MIT PhD student, explained the practical application: "You would get a bunch of people to rank three items. You could then utilize the method we developed for merging those individual results into one big model that can provide us with the big picture." The good news is that efficient algorithms exist to extract this preference information without exponential growth in the number of experiments needed.
Emma Frejinger, a computer scientist at the University of Montreal, called the work transformative: "It mathematically proves why traditional data collection fails and demonstrates that simply asking users for their best-of-three choices unlocks the ability to accurately train these powerful models. This finding provides a highly practical roadmap for collecting better data to drive more accurate optimizations".
How This Affects AI Systems You Use Every Day
- Recommendation Engines: Platforms like Netflix, Amazon, and Google News rely on RUMs to predict what content users will engage with. Better preference models mean fewer irrelevant recommendations and higher user retention.
- Large Language Model Training: During training, human raters rank candidate outputs from AI systems to teach them what tone, style, and content users prefer. Three-way rankings could dramatically improve this alignment process.
- Policy and Urban Planning: Governments use RUMs to predict how people will respond to hypothetical scenarios, such as choosing alternative routes if a major road closes or deciding how to spend municipal budgets. More accurate preference models lead to better policy decisions.
Daskalakis emphasized the stakes: "RUMs play a central role in the commercial viability and usefulness of large language models. Just as RUMs have been critical to the internet economy since the late 1990s, they are, and will remain to be, critical to the alignment of AI models going forward".
Daskalakis
The research effort, led by Gabriele Farina at MIT's Laboratory for Information and Decision Systems, focused on the computational side of RUMs, devising algorithms that can extract preference information and determining how much data is needed. The requisite number of experiments does not grow exponentially with the number of items in a catalog or database under review, making the approach scalable.
Why This Matters Now
As digital platforms become increasingly central to commerce, entertainment, and public policy, the accuracy of preference prediction directly affects billions of people. A Netflix user who cancels their subscription because the service shows them movies they do not care about represents lost revenue. A city that misallocates resources because it misunderstood citizen preferences wastes public money. An AI system trained on flawed preference data may perpetuate biases or fail to serve diverse user needs.
The MIT team's work suggests that the solution has been within reach all along. By asking for three-way rankings instead of pairwise comparisons, researchers can finally see the hidden correlations that make human preferences so complex and interconnected. As we become increasingly reliant on AI systems to understand and predict human behavior, getting this foundational mathematics right may prove to be one of the most consequential improvements in machine learning research.