The Python Reddit API Wrapper(PRAW) is used to extract information from Reddit. Analysis is carried out on this information using various Python libraries.
Reddit is a widely used social media website, with emphasis on social news aggregation, discussion and user content. From Wikipedia
Reddit is an American social news aggregation, web content rating, and discussion website. Registered members submit content to the site such as links, text posts, images, and videos, which are then voted up or down by other members. Posts are organized by subject into user-created boards called "communities" or "subreddits", which cover a variety of topics such as news, politics, religion, science, movies, video games, music, books, sports, fitness, cooking, pets, and image-sharing. Submissions with more up-votes appear towards the top of their subreddit and, if they receive enough up-votes, ultimately on the site's front page.
Using PRAW, data is retrieved from the website. Analysis is done in two Jupyter notebooks - one for a specific reddit post, while the other is for the analysis of two subreddits as a whole.
- praw
- pandas
- matplotlib
- seaborn
- spacy
- textblob
- numpy
We select the Daily Discussion post dated 31st May 2021, from the r/soccer subreddit as our specific post.
We carry out the following operations :
- Read all comments beneath the post.
- Find out the sentiment value for each comment, using TextBlob.
- Clean the resultant data, and store it in a pandas DataFrame.
- Find out the total number of positive and negative comments.
- Find out the top 10 Proper Nouns used. This gives us an idea about the most-discussed topics.
- Find out the top 10 positive words used.
We select two subreddits for this purpose - r/india and r/politics.
We carry out the following operations :
- For each subreddit,
a. Extract details of the top 100 posts from the last year, like account name, upvote ratio, total score(upvotes - downvotes), number of awards etc.
b. Analyse the source url of these posts.
c. Plot graphs between attributes like score, number of comments, number of awards etc to see whether any sort of relationship exists between them. - For r/politics, find the most discussed topics among the top posts (by using the titles of the top 100 posts).
- Compare both subreddits using the respective descriptive statistics like mean score, mean upvote ratio, mean number of comments among their top 100 posts over the last year.