1. Web Crawlers
In this we use the Beautiful soap library to develop crawlers with basic functions.
We perform a tf-idf indexing of documents crawled from the web.
3. Naive Bayes
We perform a Naive Bayes Classifier on a set of document frequencies.
We write a program to extract contact details from a website and save it in files.
We construct an inverted index to a set of 3 documents and also perform index compression.
6. Page Ranking
We build a page ranking algorithm using the networkX library, and then validate the results by implementing the same using Random Walk Method.
We build a HITS algorithm for ranking and indexing pages across the web.
We build a ID3
and a CART
decision trees, after exploring the Cleveland Heart Disease
Dataset from the UCI Repository.
We build a Multinomial Naive Bayes Classifier
for classifying tweets based on their sentiments, after exploring the US Airlines Twitter Sentiment Dataset
from kaggle.
We build a K Means Clustering
Model for clustering movie reviews based on the text content, after exploring the IMDB 50k Movie Review Dataset
from kaggle.
We build a Agglomerative Clustering
Model for clustering Credit Card Fraud detection based on the transactions, after exploring the Credit Card Fraud Dataset
from kaggle.
We build a Apriori - Associated Rule Mining
Model for finding the most frequent clicked upon site based on clickstream data, after exploring the Hungarian Newssite Clickstream Dataset
.