Skip to content

Latest commit





Web Mining CSE3014

List of Programs

In this we use the Beautiful soap library to develop crawlers with basic functions.

We perform a tf-idf indexing of documents crawled from the web.

We perform a Naive Bayes Classifier on a set of document frequencies.

We write a program to extract contact details from a website and save it in files.

We construct an inverted index to a set of 3 documents and also perform index compression.

We build a page ranking algorithm using the networkX library, and then validate the results by implementing the same using Random Walk Method.

We build a HITS algorithm for ranking and indexing pages across the web.

We build a ID3 and a CART decision trees, after exploring the Cleveland Heart Disease Dataset from the UCI Repository.

We build a Multinomial Naive Bayes Classifier for classifying tweets based on their sentiments, after exploring the US Airlines Twitter Sentiment Dataset from kaggle.

We build a K Means Clustering Model for clustering movie reviews based on the text content, after exploring the IMDB 50k Movie Review Dataset from kaggle.

We build a Agglomerative Clustering Model for clustering Credit Card Fraud detection based on the transactions, after exploring the Credit Card Fraud Dataset from kaggle.

We build a Apriori - Associated Rule Mining Model for finding the most frequent clicked upon site based on clickstream data, after exploring the Hungarian Newssite Clickstream Dataset.