All Reviews Are Not Created Equal
A few weeks ago, Shantanu wrote on recommendation engines and how user feedback and ratings can be a part of recommendations you provide to your customers. But if you have ever looked through user recommendations while shopping online for a product, stock, or movie, you know that they aren’t all helpful. Ideally, user ratings would accurately represent the population, but not all feedback is created equal, and there are some inherent challenges in these systems:
- Not everyone will rate. People may read the ratings when shopping for an item, but they won’t always come back to rate. Unless a site offers an incentive for rating a product, a customer’s only real incentive for doing so is to talk about how much they love or hate it; moderates may be under-represented.
- Ratings will be biased. People’s individual biases produce variances in ratings, even if strict guidelines (think about employee performance reviews) are presented. In addition, new raters tend to rate high. Their average ratings decrease over time as they rate more items, presumably because they are exposed to more items and have a better sense of an item’s value relative to alternatives.
- Ratings are averaged, masking the underlying data. Because people often only rate items they feel strongly about (love it or hate it), and an average of those extreme ratings may not truly represent the actual sentiment surrounding a product. For example, a review of rankings on Amazon.com revealed that “the reviews for the majority of the products have an asymmetric bimodal distribution. For these products, the mean of the online product reviews does not necessarily reveal the product’s true quality, resulting in misleading conclusions about the product’s future success.” In addition, established products are at a disadvantage against new ones. Consider a product that has received five rankings, four “5s” and one “4,” giving it an average rating of 4.8. If a new product enters the space and receives one “5” rating, it will be ranked higher than the other simply because it has fewer ratings.
- Ratings may be false. Take the case of the Whole Foods CEO who posted disparaging comments about a competitor on a message board while talking up his own company, later stating “Sometimes I simply played ‘devil’s advocate’ for the sheer fun of arguing.” Other visitors may post false comments to artificially affect the rating, and these are not always removed from the calculation.
As more products are marketed and sold online, feedback-based ranking systems are increasingly common components. Amazon and eBay were among the first to use visitor ratings to rank products and sellers, and in February, Yahoo launched its Buzz service, which asks readers to click on their favorite stories, then uses those ratings to determine the most popular articles on the web.
Why is this important? Because accurate product ratings help predict that product’s success, and higher product ratings lead to more sales – or, as Allen & Appelcline state, “the value of individual items (most frequently goods) rise or fall based upon the largely subjective judgment of individual users.” So what can you do to ensure the rating system on your own website accurately reflects your customers’ views and the value of your product?
We’ve seen some thought around using analytical techniques and Bayesian mathematics to create better product rankings. Some of the solutions explored include:
- Adjusting ratings based on known biases. Since some people rate higher or lower than others, one approach is to assign users a “User Optimism” value based on their rating history, and adjust the product’s overall rating based on the raters’ optimism value. Another approach is to remove all ratings from people who have only rated 1 or 2 items, helping to eliminate new-rater optimism or “drive-by” fraud.
- Weighting a product’s ratings based on the number of ratings received. When a product has a small number of ratings, these ratings should count less than those for a product rated many times. To achieve this, you can add a “magic value” into the algorithm that calculates a product’s average ratings. This “magic value” brings products with few ratings closer to the average ratings of all products, then reduces its effect as more ratings are received, allowing established products’ ratings to float freely and more closely reflect the average of its ratings.
- Adjusting the algorithm to account for bimodal distributions. To account for products that only have ratings at the extremes, one approach is to use a dual-point estimation model to more accurately reflect customers’ views of the product and predict the product’s success.
- Analyzing a customer’s rating history for fraud. Someone who rates several products on your site should have a predictable rating pattern. You can analyze individual customers’ accounts to identify and remove anomalies that might skew product ratings.
- Encouraging customers to leave text-based feedback. When a person takes the time to write out a product review, and knows that her name will be attached to it, she generally does a better job in her rating. These reviews tend to be closer to the average than those without.
Although your customers’ product ratings may be imperfect, they can still yield insights and value with the application of a good set of analytical techniques.


