This mini-course has been prepared with the aim of showing rather a practical side of natural language processing (NLP) than detailed theoretical aspects. Although some fundamental theory is necessary, notebooks contain many fragments of code ready to be copied & pasted into your project. The course will give you a good foundation and prepare you to tackle more advanced topics.
- Notebook 1 - Data loading and Regular expressions
- Notebook 2 - Text preprocessing, POS tags, and simple word model
- Notebook 3 - Machine Learning Basics & Classifiers
- Notebook 4 - Naive-Bayes and Logistics Regression in NLP
- Notebook 5 - Introduction to word embeddings
and
- Notebook 6 - Putting it all together!
The first 5 notebooks introduce important topics and their implementation using Python and some well-known modules. Notebook 6 presents how to solve a complete NLP classification problem using different techniques and visualizations.
Each notebook comes with a resources.md file containing sources of used graphics, external code, or datasets. The theoretical side has been developed using the sklearn
documentation and a fantastic book - Speech and Language Processing by Dan Jurafsky and James H. Martin available here!
If you spot any, feel free to email me: [email protected]!
Developed as a result of the UCL Engineering Summer Studentship 2021 by Andrzej Szablewski - the first year student, supervised by Lisa Andreevna Chalaguine - the academic researcher.