Collect any 10 documents (English text documents) from the web and create inverted index by doing necessary preprocessing steps using python.
We use the BeautifulSoup library of Python in order to parse through the websites.
We use the Requests library of Python in order to make the web page requests.
We use the RE library of Python for pattern matching irrespective of the case or capitalisation of the content in the website.
We use the NLTK library of python to perform Stemming and other NLP tasks before making predictions
The code for the program can be found in this python notebook.