Topic Analysis for Twitter Data related to University of Chicago
The project consists of a Pyspark portion for loading and parsin json files with basic filtering and saving and of a Python notebook for text analysis.
The Python notebook demonstrates some simple techniques on how to transform a tweet string into a cleaned up list of words for use with some common nlp techniques. Here I demontstrate LDA (Latent Dirichlet Allocation) from the Gensim package and use visualization from the pyLDAvis package.