Skip to content

Latest commit

 

History

History

Notebooks/ Docs for handling text in python.

This folder contains different documents/ python notebooks for handling text in python.

The Contents of this folder are:

  1. NER Tagger NER tagging of a sequence using stanford NLTK package is shown.

  2. POS Tagger Example of POS tagging of a text sequence is shown using NLTK and stanford packages.

  3. Word Embedding which gives an insight on different word embeddings like Glove, Fastext, which are commonly used to extract features from the text. A little about sentence and document level embeddings is also talked about.

  4. Commonly used Regular expressions – A file containing some commonly used Regular Expressions along with the descriptions of the expressions.

  5. Graph Network Analysis - Some of the important properties of a graph network.

  6. PDF to Doc – a python notebook to read the pdf documents in python. PDFminer package is used here.

  7. Topic Modelling – Here topic modelling is done using LDA from sklearn and genism packages.

Packages used

pdfminer, sklearn, gensim, nltk