Skip to content

smritidoneria/Medical_dataset_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Medical Dataset Analysis

This repository contains an in-depth comparative analysis of multiple machine learning algorithms for text classification tasks. The algorithms evaluated include Support Vector Machines (SVM), Random Forest, and Naive Bayes. The analysis utilizes a comprehensive dataset of research paper abstracts and full texts related to various types of cancer.

Dataset

The dataset used for this analysis comprises research paper abstracts and full texts. The papers cover different types of cancer, providing a rich and diverse set of texts for classification.

Algorithms

The following machine learning algorithms were evaluated in this study:

  1. Support Vector Machines (SVM)
  2. Random Forest
  3. Naive Bayes

Text Preprocessing

  1. Tokenization: Splitting the text into individual words or tokens.
  2. Stemming: Reducing words to their root form.
  3. TF-IDF Vectorization: Converting text data into numerical form using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization.

Evaluation Metrics

Accuracy Precision Recall F1 Score

Results

The results of the analysis highlight the performance of each algorithm based on the evaluation metrics. Detailed findings are provided in the results section of the project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published