This folder contains different documents/ python notebooks for handling text in python.
The Contents of this folder are:
-
NER Tagger NER tagging of a sequence using stanford NLTK package is shown.
-
POS Tagger Example of POS tagging of a text sequence is shown using NLTK and stanford packages.
-
Word Embedding which gives an insight on different word embeddings like Glove, Fastext, which are commonly used to extract features from the text. A little about sentence and document level embeddings is also talked about.
-
Commonly used Regular expressions – A file containing some commonly used Regular Expressions along with the descriptions of the expressions.
-
Graph Network Analysis - Some of the important properties of a graph network.
-
PDF to Doc – a python notebook to read the pdf documents in python. PDFminer package is used here.
-
Topic Modelling – Here topic modelling is done using LDA from sklearn and genism packages.
pdfminer, sklearn, gensim, nltk