Skip to content

The motive of this project is to find out the customer satisfaction of some residential hotels of Dhaka.

Notifications You must be signed in to change notification settings

Tanzim-prog/sentiment_analysis_ml_stringdata

Repository files navigation

Sentiment Analysis on Text Data using Machine Learning

The motive of this project is to find out the customer satisfaction of some residential hotels of Dhaka. This project was done in VS Code with Python programming language. The dataset was created through web scraping using parsing method in RStudio using R programming language. Almost 5,000 reviews were collected of 8 hotels in Dhaka City. In this project almost 600 reviews were used of only 1 hotel.

The methods followed in this project are:

  1. Data Processing
  2. Vectorization
  3. Spliting Dataset
  4. Evaluate the model
  5. Generate Sentiment Score
  6. Visualize Sentiment Analysis

Tools Used

  • Python programming language
  • Visual Studio Code IDE

Text Cleaning

Libraries Used

  • re
  • openpyxl
  • pandas

Tokenization

Libraries Used

  • nltk
  • pandas
  • nltk.tokenize (word_tokenize)

Stop Words Removal

Libraries Used

  • nltk
  • pandas
  • nltk.corpus (stopwords)

Stemming/Lemmatization

Libraries Used

  • nltk
  • pandas
  • nltk.stem (PorterStemmer, WordNetLemmatizer)

TF-IDF Vectorization

Libraries Used

  • json
  • pandas
  • sklearn.feature_extraction.text (TfidfVectorizer)

Stemming/Lemmatization

Libraries Used

  • json
  • joblib
  • numpy
  • sklearn.model_selection (train_test_split)
  • sklearn.linear_model (LogisticRegression)
  • sklearn.metrics (accuracy_score, classification_report, confusion_matrix, precision_score, recall_score, f1_score)

For detailed documentation please visit and download files from the link

(https://github.com/Tanzim-prog/sentiment_analysis_ml_stringdata/tree/master/Documentation)