Twitter-LDA

Topic Analysis for Twitter Data related to University of Chicago

The project consists of a Pyspark portion for loading and parsin json files with basic filtering and saving and of a Python notebook for text analysis.

The Python notebook demonstrates some simple techniques on how to transform a tweet string into a cleaned up list of words for use with some common nlp techniques. Here I demontstrate LDA (Latent Dirichlet Allocation) from the Gensim package and use visualization from the pyLDAvis package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Twitter-LDA

Files

README.md

Latest commit

History

README.md

File metadata and controls

Twitter-LDA