Blog Post

Is it time to retire the 5-star rating system?

Thanks to the rating systems in place on such popular websites as Netflix (s NFLX), Amazon (s AMZN) and eBay (s EBAY), many people have become comfortable evaluating things in absolute terms: a two-star restaurant, a B movie and so on. But new research out of the Massachusetts Institute of Technology says that this approach to ranking things is fundamentally flawed.

Recommendation systems should instead ask users to compare products in pairs, not as stand-alone items, says Devavrat Shah, a professor at MIT’s Laboratory of Information and Decisions Systems.

According to Shah, the kind of star rating systems that are the status quo on the web today are flawed because, well, humans are flawed. “If my mood is bad today, I might give four stars, but tomorrow I’d give five stars. But if you ask me to compare two movies, most likely I will remain true to that for a while,” Shah says in an article published this week on MIT’s news site. “Your three stars might be my five stars, or vice versa. For that reason, I strongly believe that comparison is the right way to capture this.”

In a series of recently published academic papers, Shah, along with students Ammar Ammar and Srikanth Jagabathula, as well as MIT Sloan School of Management professor Vivek Farias, demonstrated that stitching “pairwise rankings” together into a master list is a more accurate representation of customer sentiment than relying on customers to rate things by themselves on a typical five-star scale. According to the MIT researchers, they have formulated algorithms that have proven to accurately predict shoppers’ preferences with 20 percent greater accuracy than the kinds of formulas most often in use today. They have built a website,, to show off their theories in practice.

The success of using a more complicated algorithmic approach for recommendation engines has been proven by Apple’s (s AAPL) iTunes software, and in particular, its Genius song selection feature. An Apple engineer disclosed last year that Genius uses much more than just the star ratings system to power its personalized song recommendation engine. In fact, iTunes use a complex combination of big data analytics and aggregated personal information to customize content for users.

Of course, finding programmers is tough as it is, and not all web companies can afford to hire Apple-caliber software engineers or MIT Ph.D.s to formulate their recommendation engines. Also, users on sites such as Yelp have become very confident in their roles as armchair critics, adding and subtracting stars from reviews for highly specific reasons. But then again, classic websites such as — and even Facebook predecessor — have shown that one-to-one comparisons can be fun, too. According to the folks at MIT, if comparison engines can make the leap from fun pastime into the big leagues of e-commerce, recommendation systems could get even more spookily accurate.

Image from the cover art of The Complete Works of The Critic DVD set, found on

10 Responses to “Is it time to retire the 5-star rating system?”

  1. In order to do an honest comparison rating, the reviewer would have to purchase or try out a comparable product or service immediately – which isn’t viable for most consumers. I think contextual ratings are a better alternative, which lets a reviewer explain how/why/at what skill level they were expecting to use a product and why it didn’t meet their needs for that application. What I like is Amazon’s statistics on the percentage of viewers who bought a product versus similar products in that category.

  2. It seems to me that this paired ranking fails to address the rock/paper/scissors issue. In other words if I believe that x > y and y > z that does not necessarily mean I will believe x > z. If I cannot trust these implied relationships then I am forced to compare every possible combination of items in my data set and even when I do all that I can really say about the results is that I know accurately what the relative rank of any two items within that data set is. While I agree with many of the stated problems with the 5-star system, paired ranking doesn’t seem like a viable alternative.

  3. Ashwin

    Rating in pairs may create unnecessary disputes between the two entities or may given a wrong impression on public. Because, a company lagging in some issues, may lead in other ones. So, rating in pairs is not a bad concept and with specifications, it can work better.
    eg: instead of comparing two movies, compare two “action” movies…like that.

  4. I’d just like to add that companies don’t have to hire “Apple-caliber software engineers or MIT PhDs to formulate their recommendation engines” – we have built a personalization and recommendation engines framework based around some patent-pending technology and around 50 interchangable algorithms, along with high-level attribute filtering. Any company can quite simply integrate our framework and begin personalizing and recommending the right content to the right visitor at the right time.