GitHub - d-vignesh/ComedianTranscriptAnalysis

This repo contains scripts that uses the nlp techniques to obtain insights from a set of comedian transcripts. step 1 : We start with scraping the transcripts data and applying text cleaning techniques and create the Document Term matrix.
step 2 : some exploratory data analysis on the dataset like constructing wordcloud, obtaining the word frequency and profanity to verify whether our data makes sense.
step 3 : Perform sentiment analysis on the transcript using Textblob and get info on how each comedian's sentiment varies over the routine.
step 4 : Perform topic modelling using Latent Dirichlet Allocation and try come with the topics each comedian mostly uses in their comedy.
step 5 : as a fun task we try the task of text generation. We try the markov_chain techinque and also RNN to generate similar transcripts.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DataProcessing.ipynb		DataProcessing.ipynb
ExploratoryDataAnalysis.ipynb		ExploratoryDataAnalysis.ipynb
README.md		README.md
SentimentAnalysis.ipynb		SentimentAnalysis.ipynb
TextGeneration.ipynb		TextGeneration.ipynb
TopicModeling.ipynb		TopicModeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

d-vignesh/ComedianTranscriptAnalysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages