GitHub

Medical Dataset Analysis

This repository contains an in-depth comparative analysis of multiple machine learning algorithms for text classification tasks. The algorithms evaluated include Support Vector Machines (SVM), Random Forest, and Naive Bayes. The analysis utilizes a comprehensive dataset of research paper abstracts and full texts related to various types of cancer.

Dataset

The dataset used for this analysis comprises research paper abstracts and full texts. The papers cover different types of cancer, providing a rich and diverse set of texts for classification.

Algorithms

The following machine learning algorithms were evaluated in this study:

Support Vector Machines (SVM)
Random Forest
Naive Bayes

Text Preprocessing

Tokenization: Splitting the text into individual words or tokens.
Stemming: Reducing words to their root form.
TF-IDF Vectorization: Converting text data into numerical form using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization.

Evaluation Metrics

Accuracy Precision Recall F1 Score

Results

The results of the analysis highlight the performance of each algorithm based on the evaluation metrics. Detailed findings are provided in the results section of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Medical dataset EDA+sentiment analysis.ipynb		Medical dataset EDA+sentiment analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Dataset Analysis

Dataset

Algorithms

Text Preprocessing

Evaluation Metrics

Results

About

Releases

Packages

Languages

smritidoneria/Medical_dataset_analysis

Folders and files

Latest commit

History

Repository files navigation

Medical Dataset Analysis

Dataset

Algorithms

Text Preprocessing

Evaluation Metrics

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages