Skip to content

Latest commit

 

History

History

Inverted_Indexing_Lab_2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Inverted Indexing

Problem

Collect any 10 documents (English text documents) from the web and create inverted index by doing necessary preprocessing steps using python.

Steps

We use the BeautifulSoup library of Python in order to parse through the websites.

We use the Requests library of Python in order to make the web page requests.

We use the RE library of Python for pattern matching irrespective of the case or capitalisation of the content in the website.

We use the NLTK library of python to perform Stemming and other NLP tasks before making predictions

The code for the program can be found in this python notebook.