In this Twitter Sentiment Analysis project, the goal is to decipher the impact of tweets on the sentiment behind a specific message, be it positive, negative, or neutral. With the constant influx of tweets, understanding the words or phrases that contribute to the sentiment is crucial for brands and individuals. The competition involves extracting support phrases from tweets using the Sentiment Analysis: Emotion in Text dataset from Figure Eight's Data for Everyone platform, with existing sentiment labels.
The project challenged to build a machine-learning model capable of identifying the words or phrases that best represent the sentiment in a given tweet. The dataset, used under a Creative Commons attribution 4.0 international license, may contain profane or offensive content.
To evaluate the models, the project used the word-level Jaccard score, a metric that measures the similarity between two strings. Predicted the string that best supports the sentiment for each tweet in the test set.