GregoryAIxNovaSBE

Classification of Relevant Multiple Sclerosis Articles

Project Description

Our goal is to enhance the existing Machine Learning solution of GregoryAI with the overall aim to facilitate more accurate knowledge sharing among multiple sclerosis patients, medical staff, and researchers. We have implemented PubMed BERT model with a 96.5% Recall.

Installation

Clone the repository

git clone https://github.com/franciscogomes1999/GregoryAIxNovaSBE.git

Install required packages

pip install -r requirements.txt

Usage

Train the Model

Open and run the training_model.ipynb notebook inside the folder notebooks_to_run.
Follow the instructions within the notebook to train the model.

Classify New Articles

Open and run the classification_of_articles.ipynb notebook inside the folder notebooks_to_run.
Follow the instructions within the notebook to classify new articles.

Pipelines

Train & Tune Pipeline

Data: Retrieve the articlesdataset.csv from the database.
Unprocessed Data: The raw dataset is saved as articlesdataset.csv.
Clean + Preprocess: Data cleaning and preprocessing steps are performed.
Processed Data: The cleaned data is saved as processed_data.csv.
Split Data: Split the data into training, validation, test, and unlabelled datasets.
Pseudo Labeling:

Generate pseudo labels for the unlabelled data.
Filter and select high-confidence pseudo labels.

Train:

Train the model using the labelled and pseudo-labelled data.
Store the trained model weights.

Evaluate: Evaluate the model's performance using validation and test datasets.

Classify Pipeline

Download Data: Retrieve the articlesdataset.csv from the database.
Unprocessed Data: The raw dataset is saved as articlesdataset.csv.
Clean + Preprocess: Data cleaning and preprocessing steps are performed.
Filtered and Processed Data: The cleaned data is saved as new_unlabelled_articles.csv.
Classify:

Imports the model weights.
Classifies the articles to generate predictions.

Output: Save the updated CSV file with predicted labels for new articles.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.

Contact

If you have any questions or feedback, please contact one of the following contacts:

Julia Antonioli - [email protected]
Kuba Bialczyk - [email protected]
Nicolò Mazzoleni - [email protected]
Francisco Gomes - [email protected]
Martim Esteves - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
articles_classification		articles_classification
code_utils		code_utils
data		data
images		images
model_documentation		model_documentation
models		models
notebooks_to_run		notebooks_to_run
.gitignore		.gitignore
BERT_class_test.ipynb		BERT_class_test.ipynb
LICENSE		LICENSE
README.md		README.md
__main__.py		__main__.py
dependencies.txt		dependencies.txt
model_loading_showcase.ipynb		model_loading_showcase.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GregoryAIxNovaSBE

Classification of Relevant Multiple Sclerosis Articles

Project Description

Installation

Usage

Train the Model

Classify New Articles

Pipelines

Train & Tune Pipeline

Classify Pipeline

License

Contact

About

Releases

Packages

Contributors 4

Languages

License

franciscogomes1999/GregoryAIxNovaSBE

Folders and files

Latest commit

History

Repository files navigation

GregoryAIxNovaSBE

Classification of Relevant Multiple Sclerosis Articles

Project Description

Installation

Usage

Train the Model

Classify New Articles

Pipelines

Train & Tune Pipeline

Classify Pipeline

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages