Image representing Yelp as depicted in CrunchBase

Image via CrunchBase

How, then, does a consumer make sense of the spate of opinions, many of which contradict each other? New research suggests a scientific way to aggregate reviews in a way that makes ratings more meaningful to consumers—and fairer to the businesses they review.

Michael Luca, an assistant professor at Harvard Business School, believes a review framework developed with colleagues could provide better information on sites such as Yelp, eBay, and TripAdvisor. The framework relies on an algorithm set up to tackle bias inherent in reviews by taking into account reviewers who vary in accuracy, stringency, and by reputation.

Luca co-wrote the paper, Optimal Aggregation of Consumer Ratings: An Application to Yelp.com, with Weijia Dai, of the University of Maryland; Jungmin Lee, of Sogang University and the Institute for the Study of Labor; and Ginger Jin, of the University of Maryland and the National Bureau of Economic Research. (The research is a follow-up to Luca’s 2011 paper Reviews, Reputation and Revenue: The Case of Yelp.com, which explored how restaurant reviews on Yelp affect their bottom lines.)

Reviewing the reviewers

After spending some time on Yelp, Luca questioned whether the site’s star-based rating system (five stars for the best) truly reflected the quality of the reviewed products and businesses. “I was skeptical at times,” he says. “I saw that some reviewers were better at reviewing than others. Do their reviews get more weight? Who decides whether one review is unbiased or more thorough than another review?”

Yelp (and many other review websites) aggregate information by giving an arithmetic average rating based on collective reviews. The problem, Luca says, is that star systems such as these aren’t as accurate, or optimal, as they could be.

“Arithmetic averages are only optimal under very restrictive conditions, such as when reviews are unbiased, independent, and identically distributed signals of true quality,” he says. “Essentially, they treat a restaurant that gets three five-star reviews followed by three one-star reviews the same way that they treat a restaurant that gets three one-star reviews followed by three five-star reviews. [In Luca’s thinking, the newer reviews are more accurate about current conditions.] Moreover, they don’t account for observable patterns in the way individuals rate restaurants.”

Luca’s team decided to try to develop what they believed would be an optimal way to construct overall ratings on Yelp. They used the site’s history of reviews to identify factors considered by reviewers—the accuracy of a reviewer can be determined, for example, by studying how far that person’s opinions stray from the long-running average of the restaurants they review.

The new ratings framework takes into account:

  • The way reviewers review. Some reviewers are always critical and leave worse reviews on average; others might always be positive, fawning over every restaurant’s spaghetti or tuna sub. Still others are erratic, showing little predictability in how they review. Yelp’s cadre of so-called elite reviewers have higher status on the site (They receive this status from Yelp based on their compliments sent to other Yelpers, whether they vote for reviews that are Useful, Funny, and Cool, and if they consistently post respectful and quality content). But there’s no way under a simple weight average system to give these reviews more collective credence than others. “On Yelp, Elites are given the same weight but they give better reviews,” Luca says.
  • The influence of other reviewers. There’s peer pressure on Yelp, and elite Yelpers worry about their status. So a bunch of glowing reviews by elite reviewers might make another elite reviewer more likely to post a positive review, too—or a negative one. Of course, some reviewers care more about their reputation on Yelp than others; while others may prefer to never deviate from their prior reviews.
  • How product quality changes. Say a restaurant fires its head chef or manager and its reviews drop from three stars to two—does the rating system pick up on this change in a timely manner that is meaningful to diners? Is the same merit placed on old reviews as new ones? How will a diner know that the food and menu have changed? “Our model will pick up on that,” Luca says.

After creating a sample from Yelp’s site, Luca’s team compared their restaurant ratings to Yelp’s and found that their ratings differed significantly.

“A conservative finding is that roughly 25 to 27 percent of restaurants are more than .15 stars away from the optimal rating,” the researchers write. and 8 to 10 percent are more than .25 stars” from optimal. Their finding suggests that large gains in accuracy could be made by implementing optimal ratings and getting rid of bias.

Luca thinks consumers will benefit more from optimal ratings.

“What this does is reduce the noise,” he says. “We’re trying to extract more information from the reviews.”

Most of the average difference between the team’s system and Yelp’s falls in the area of restaurant quality. “This is because the simple average weights a restaurant’s first review the same as it weights the thousandth review,” according to the paper. “In contrast, our algorithm reduces the weight assigned to early reviews and hence more quickly adapts to changes in quality.”

Other uses

Luca says their framework has other purposes—perhaps to better analyze trends in the restaurant industry. It can also be applied to any website that relies on consumer ratings to convey information of product or service quality.

But don’t expect Yelp to change its review system any time soon.

“I’ve talked to Yelp and other review companies,” Luca says. “Yelp isn’t necessarily interested in changing the way it aggregates, but [other review] companies can also use these insights when deciding the order in which they show restaurants.”

Luca says he uses Yelp to find restaurants when he travels, but with limits. He trusts recommendations when they are offered by a large number of reviewers–800 instead of 15, for instance. With smaller numbers, whether it’s a restaurant or a moving company, he questions whether these are “just people connected to the business?”

“With fewer reviews I rely on the experts [like professional critics],” he says. “This is one of the reasons for having careful aggregation of information.”

About the author

Kim Girard is a writer in Brookline, Massachusetts.