What Are They Saying? Methodology for Extracting Information from Online Reviews

By Jabr, Wael📧, Yichen Cheng, Kai Zhao, Sanjay Srivastava

Working Paper, 2018.

The growth of online shopping has made online reviews a critical source of information for consumers. These reviews however are in abundance and keep arriving persistently over time, be it the ones that rate the product highly or the ones that rate it less favorably, making it difficult to search for useful and relevant information from the post-purchase experiences of others. In this paper, we develop a methodology that leads to a simple representation of information being revealed in reviews. Specifically, for each product, we extract the relevant aspects of the product that are discussed in the reviews. We develop a measure of each reviewer’s satisfaction with of these aspects. This leads to a simple representation of the information revealed in reviews: the discovery of salient aspects and then the extent of satisfaction of different reviewers with each of these aspects. We apply this methodology to a large review dataset from Amazon. This allows us to evaluate the temporal evolution of user satisfaction with these aspects at a granular level. We show that initial reviewers report a few salient aspects of the product and their experiences with those aspects. Subsequent reviewers continue to report their experiences with these aspects. We find that user satisfaction with these aspects are very different when comparing favorable reviews to less favorable ones. Somewhat surprisingly, aspects that generate a strong positive satisfaction for positive reviews have a neutral or muted mention in negative reviews. Our results suggest simple strategies for platforms hosting reviews to easily provide relevant and useful information to customers.

Keywords: User-generated content; Topic modeling; Review dimensions; Review extremity