Amazon Product Review Sentiment Analysis

Business Problem

Background:

Amazon, the world's largest online retailer, prides itself on customer-centricity. Customer reviews form the backbone of Amazon's product ecosystem, guiding millions of purchasing decisions daily and playing a pivotal role in vendor and product rankings. As the platform continues to scale, the sheer volume of reviews makes manual monitoring an insurmountable task.

Problem Statement:

While Amazon's current review system with star ratings gives a quantitative insight into product quality, it lacks the granularity needed to understand the nuanced opinions of its vast user base. Moreover, with millions of reviews generated daily, manual analysis becomes an insurmountable challenge.

The Limitations of Star Ratings:

Lack of Specificity: A customer might give a product 3 stars, but what does that mean? Are they somewhat satisfied, or did they encounter specific issues? Without the accompanying text, it's hard to say.
Varied Interpretations: One person's 5-star experience might be another person's 4-star experience. Relying solely on star ratings can be misleadig.
No Actionable Insights: Star ratings don't tell you what the problems are. If a product receives a 1-star rating, what should the seller improve? The packaging? The product quality? The shipping time? Without a textual review, it's all guesswork

Examples of Star Ratings:

Here we have 2 opposite reviews of the same product both with 3 stars
Example of 4 stars review with Negative Sentiment
We can even find 5 stars with Negative Sentiment:

Objective:

Develop an automated solution using machine learning and natural language processing techniques to:

Accurately identify negative reviews among the millions of reviews posted on Amazon daily.
Categorize the severity of the negative feedback to prioritize actions.
Extract actionable insights from negative reviews to provide to vendors, helping them improve product quality and customer experience.

Value Proposition:

By effectively identifying and addressing negative reviews, Amazon aims to:

Enhance customer trust by showing responsiveness to feedback.
Increase customer retention by resolving issues proactively.
Boost overall platform sales by improving product and vendor quality through actionable feedback.
Potentially save millions in revenue by preventing customer churn and fostering brand loyalty.

Measure of Success:

A successful solution will showcase a significant improvement in the Recall metric, ensuring that the vast majority of negative reviews are captured. The financial implications of improved customer retention and vendor product enhancement will serve as a testament to the initiative's success.

Data

Column Name	Description
sentiment	Integer identifying one of 2 sentiments: 1: Negative 2: Positive
title	Title of the review
text	Review text

Solution Strategy

The strategy employed was the CRISP method, a scientific method based on cycles:

The project cycles were divided into the following phases:

Problem Understanding
Data Description
Data Understanding
Text Processing
Data Preparation
Model Training
Model Evaluation
Model Deployment
Business Performance

Data Understanding

In this section I analized how our review texts were composed, like the size of the texts, most used words, ponctuation, etc.

For example, this was the most used words in general:

Notice that 'book' is most used word, because the majority of sales in amazon are from books. But there are some words that capture the sentiment that are used a lot, like: 'good', 'great', 'love'.

Here is a WordCloud of the most used words in Positive Reviews:

And for Negative Reviews:

Data Preparation and Model Training

To prepare our data for the model I used some techniques:

Tokenization
Word Embedding
Lemmatization
Remove StopWords
Remove Ponctuation/extra spaces
Uncase Words

You can see more detailed explanatation of the process in the notebook here.

The model used to classify the texts is a Sequential Model in Deep Learning. The main object in this model architecture is the LSTM(Long-Short Term Memory) that can capture long-range patterns and dependencies in the text. Here is a example of how the model will work:

You can see more the model with more detail in the notebook here.

Results

After the training here are some reviews that the model classified:

Negative Sentiment Correctly Classified

Original Text: The Betrothed is an excellent book, but this is not the book, and it's not obvious from looking at this page. I have a wee baby so am sleep-deprived and perhaps that's why I didn't notice, but I definitely think it could be made more obvious. I bought this for a gift anf it was a bit embarrassing.

Original Sentiment: Negative
Predicted Sentiment: Negative

Positive Sentiment Correctly Classified

Original Text: This is by far the best Jeff healey album ever, and it's incredible that almost none of the songs are presented in his complitation "THE VERY BEST OF JEFF HEALEY". This fact alone killed the compilation.

Original Sentiment: Positive
Predicted Sentiment: Positive

Negative Sentiment Incorrectly Classified

Original Text: "Give war a chance!" Get it? The title is a rebuke to the John Lennon song "give peace a chance!" And that is as funny as this book gets. The author attempts to be funny in this way. He is white, well to do, and adored by many conservatives and libertarians for he mocks those who try to change the world for the better. In an earlier time, he would have made fun of Negroes and injuns, but he is not that crude. Here he merely states that war is ok, just don't let me fight in one! In all, a very funny and nasty book for the fat cat on your gift list.Let others fight wars, PJ is too busy making rationales for them!

Original Sentiment: Negative
Predicted Sentiment: Positive

Finding Optimal Threshold for Recall Metric

By analyzing the model's error, I observed that the majority of errors occur when the probability for the positive class falls between 40% and 60%. This range is where the model tends to make the most mistakes. The graphs below display the model's errors, as well as the count of errors corresponding to the predicted probability of a positive sentiment:

The Density graph makes clear that most errors occurs when the probability is around 40% to 60%.

To maximize recall, I've chosen to classify uncertain probabilities as Negative. Consequently, my final threshold is set at 0.6:

If Probability more than 60% -> Positive Sentiment
If Probability less than 60% -> Negative Sentiment

Confusion Matrix

My data as divided as follows:

63,000 examples on for training the model
27,000 examples to validate the model
40,000 examples to test the model

Here is the confusion matrix for the Validation Set:

And for the Test Dataset:

Looking at the Test Confusion Matrix, we can see that our model correctly classified 16,836 positives reviews of the 20.000 Positive reviews.

Also, it correctly classified 17,934 Negative reviews of the 20.000 total Negative reviews.

Classification Metrics

I used Accuracy, Precision, Recall and F1-Score metrics to evaluate my model, that the final metrics on the Test Set:

Metric	Value
Accuracy	0.8404
Precision	0.8060
Recall	0.8967
F1 Score	0.8489

Introducing the Neutral Sentiment Category for User Deployment

"I've developed an app using Streamlit where users can input reviews and instantly receive sentiment analysis results.

Upon analyzing the predictions of our model, I observed that for certain reviews, the model exhibited uncertainty or lacked strong confidence in its predictions. To accommodate such instances and offer a more nuanced classification, I've introduced a "Neutral" sentiment category.

The revised sentiment labeling based on the model's probability output is as follows:

Positive Sentiment: When there's over 60% probability for a positive outcome.
Negative Sentiment: When the positive probability is less than 40%.
Neutral Sentiment: For cases where the model's positive probability lies between 40% and 60%, indicating a level of uncertainty.

This approach ensures a more comprehensive and nuanced analysis of reviews, capturing sentiments that may not strictly fall into the traditional positive or negative categories.

Deploy to Production

I deployed the model in Streamlit Sharing so you can test some reviews for yourself! Here is an example:

Try to trick it by writing some tricky reviews! Link: Streamlit.

Business Performance

Assumptions

On average, Amazon garners 10,000 reviews daily.
Of these, 5% (or 500 reviews) are negative.
Among the negative feedback, 20% are disguised as star ratings. This means Amazon could potentially overlook 100 subtly critical reviews each day.

Cost of Dissatisfaction

Each unidentified and unaddressed negative review risks losing a customer permanently.
The estimated Lifetime Value (LTV) of an Amazon customer stands at $500, representing the potential profit derived from a customer throughout their association with the company.
For the sake of this analysis, let's assume Amazon loses all customers associated with the 100 camouflaged negative reviews.

Recall: Before vs. After the Model Implementation

Without the model in place, Amazon faces a potential loss of 100 customers daily, translating to a revenue loss of 100 x $500 = $50,000 daily.
With the model's deployment, the recall rate surges to 89%. This means Amazon now captures feedback from 84 of the camouflaged negative reviews daily, missing only 11 such reviews.
By this metric, instead of losing $50,000 daily, the model helps Amazon salvage 89 x $500 = $44,500 each day. Annually, this amounts to a substantial recovery of $16,242,500!

References

Author: Edilson Santos, Data Scientist.

Author Linkedin: https://www.linkedin.com/in/edilsonsantosjr/

Database: https://www.kaggle.com/datasets/kritanjalijain/amazon-reviews

Portfolio: https://edjr94.github.io/portfolio_english/

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.ipynb_checkpoints		.ipynb_checkpoints
api		api
api_streamlit		api_streamlit
beanstalk		beanstalk
notebooks		notebooks
src		src
README.md		README.md
api_streamlit.zip		api_streamlit.zip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Review Sentiment Analysis

Business Problem

Background:

Problem Statement:

The Limitations of Star Ratings:

Examples of Star Ratings:

Objective:

Value Proposition:

Measure of Success:

Data

Solution Strategy

Data Understanding

Data Preparation and Model Training

Results

Negative Sentiment Correctly Classified

Positive Sentiment Correctly Classified

Negative Sentiment Incorrectly Classified

Finding Optimal Threshold for Recall Metric

Confusion Matrix

Classification Metrics

Introducing the Neutral Sentiment Category for User Deployment

Deploy to Production

Business Performance

Assumptions

Cost of Dissatisfaction

Recall: Before vs. After the Model Implementation

References

About

Releases

Packages

Contributors 2

Languages

EDJR94/sentiment_analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Review Sentiment Analysis

Business Problem

Background:

Problem Statement:

The Limitations of Star Ratings:

Examples of Star Ratings:

Objective:

Value Proposition:

Measure of Success:

Data

Solution Strategy

Data Understanding

Data Preparation and Model Training

Results

Negative Sentiment Correctly Classified

Positive Sentiment Correctly Classified

Negative Sentiment Incorrectly Classified

Finding Optimal Threshold for Recall Metric

Confusion Matrix

Classification Metrics

Introducing the Neutral Sentiment Category for User Deployment

Deploy to Production

Business Performance

Assumptions

Cost of Dissatisfaction

Recall: Before vs. After the Model Implementation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages