This repo contains scripts that uses the nlp techniques to obtain insights from a set of comedian transcripts.
step 1 : We start with scraping the transcripts data and applying text cleaning techniques and create the Document Term matrix.
step 2 : some exploratory data analysis on the dataset like constructing wordcloud, obtaining the word frequency and profanity to verify whether our data makes sense.
step 3 : Perform sentiment analysis on the transcript using Textblob and get info on how each comedian's sentiment varies over the routine.
step 4 : Perform topic modelling using Latent Dirichlet Allocation and try come with the topics each comedian mostly uses in their comedy.
step 5 : as a fun task we try the task of text generation. We try the markov_chain techinque and also RNN to generate similar transcripts.
-
Notifications
You must be signed in to change notification settings - Fork 0
d-vignesh/ComedianTranscriptAnalysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published