Skip to content

Latest commit

 

History

History
38 lines (22 loc) · 1.37 KB

File metadata and controls

38 lines (22 loc) · 1.37 KB

OBJECTIVE:

Predicting the sentiment of news headline and classifying it to proper news category (web scraped data from Inshorts website)

WEBSCRAPING:

Here I used BeautifulSoup library for Scraping I scraped data from the below html pages, 'https://inshorts.com/en/read/technology.html', 'https://inshorts.com/en/read/sports.html', 'https://inshorts.com/en/read/world.html'

PREPROCESSING:

Using NLTK library I did the following things, Removing Stopwords-Stopwords are the words in any language which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence eg-“the”, “a”, “an”, “in"

Stemming,Lemmatization-Reduce a word to its root or base unit eg:eating, eats, eaten root verb is eat.

Removing accented characters-eg-Café,Naïve

Expanding contractions-eg-Could've to could have

Removing unwanted characters using RE -eg removing emoticons,url,html tags etc..

UNSUPERVISED LEARNING:

Using affin,vader,textblob predicting the sentiment of the headline,whether they are of Postive Sentiment or Negative Sentiment

SUPERVISED LEARNING:

Using Bagofwords Countvectorizer method we put each word in a sentence to a bag and then feed it to classifcation algorithms(logistic,Naivebayes,Decision tree). The classification is done to classify the news to proper category.

EVALAUTION METRIC:

Confusion matrix,Accuracy score