Name		Name	Last commit message	Last commit date
parent directory ..
Agglomerative_Lab_10		Agglomerative_Lab_10
Apriori_Lab_11		Apriori_Lab_11
Decision_Tree_Lab_8		Decision_Tree_Lab_8
HITS_Algorithm_Lab_7		HITS_Algorithm_Lab_7
Index_Compression_Lab_5		Index_Compression_Lab_5
Inverted_Indexing_Lab_2		Inverted_Indexing_Lab_2
K_Means_Lab_10		K_Means_Lab_10
Multinomial_NB_Lab_9		Multinomial_NB_Lab_9
Naive_Bayes_Lab_3		Naive_Bayes_Lab_3
Page_Ranking_Lab_6		Page_Ranking_Lab_6
Selenium_Extract_Contact_Lab_4		Selenium_Extract_Contact_Lab_4
Web_Crawlers_Lab_1		Web_Crawlers_Lab_1
.gitignore		.gitignore
README.md		README.md

README.md

Web Mining CSE3014

List of Programs

1. Web Crawlers

In this we use the Beautiful soap library to develop crawlers with basic functions.

2. Inverted Indexing

We perform a tf-idf indexing of documents crawled from the web.

3. Naive Bayes

We perform a Naive Bayes Classifier on a set of document frequencies.

4. Selenium Introduction

We write a program to extract contact details from a website and save it in files.

5. Index Compression

We construct an inverted index to a set of 3 documents and also perform index compression.

6. Page Ranking

We build a page ranking algorithm using the networkX library, and then validate the results by implementing the same using Random Walk Method.

7. HITS Algorithm

We build a HITS algorithm for ranking and indexing pages across the web.

8. Decision Tree

We build a ID3 and a CART decision trees, after exploring the Cleveland Heart Disease Dataset from the UCI Repository.

9. Multinomial Naive Bayes Classifier

We build a Multinomial Naive Bayes Classifier for classifying tweets based on their sentiments, after exploring the US Airlines Twitter Sentiment Dataset from kaggle.

10. K - Means Clustering

We build a K Means Clustering Model for clustering movie reviews based on the text content, after exploring the IMDB 50k Movie Review Dataset from kaggle.

11. Agglomerative Clustering

We build a Agglomerative Clustering Model for clustering Credit Card Fraud detection based on the transactions, after exploring the Credit Card Fraud Dataset from kaggle.

12. Apriori - Associated Rule Mining

We build a Apriori - Associated Rule Mining Model for finding the most frequent clicked upon site based on clickstream data, after exploring the Hungarian Newssite Clickstream Dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web_Mining_CSE3024

Web_Mining_CSE3024

README.md

Web Mining CSE3014

List of Programs

1. Web Crawlers

2. Inverted Indexing

3. Naive Bayes

4. Selenium Introduction

5. Index Compression

6. Page Ranking

7. HITS Algorithm

8. Decision Tree

9. Multinomial Naive Bayes Classifier

10. K - Means Clustering

11. Agglomerative Clustering

12. Apriori - Associated Rule Mining

Files

Web_Mining_CSE3024

Directory actions

More options

Directory actions

More options

Latest commit

History

Web_Mining_CSE3024

Folders and files

parent directory

README.md

Web Mining CSE3014

List of Programs

1. Web Crawlers

2. Inverted Indexing

3. Naive Bayes

4. Selenium Introduction

5. Index Compression

6. Page Ranking

7. HITS Algorithm

8. Decision Tree

9. Multinomial Naive Bayes Classifier

10. K - Means Clustering

11. Agglomerative Clustering

12. Apriori - Associated Rule Mining