Our goal is to enhance the existing Machine Learning solution of GregoryAI with the overall aim to facilitate more accurate knowledge sharing among multiple sclerosis patients, medical staff, and researchers. We have implemented PubMed BERT model with a 96.5% Recall.
- Clone the repository
git clone https://github.com/franciscogomes1999/GregoryAIxNovaSBE.git
- Install required packages
pip install -r requirements.txt
- Open and run the training_model.ipynb notebook inside the folder notebooks_to_run.
- Follow the instructions within the notebook to train the model.
- Open and run the classification_of_articles.ipynb notebook inside the folder notebooks_to_run.
- Follow the instructions within the notebook to classify new articles.
- Data: Retrieve the articlesdataset.csv from the database.
- Unprocessed Data: The raw dataset is saved as articlesdataset.csv.
- Clean + Preprocess: Data cleaning and preprocessing steps are performed.
- Processed Data: The cleaned data is saved as processed_data.csv.
- Split Data: Split the data into training, validation, test, and unlabelled datasets.
- Pseudo Labeling:
- Generate pseudo labels for the unlabelled data.
- Filter and select high-confidence pseudo labels.
- Train:
- Train the model using the labelled and pseudo-labelled data.
- Store the trained model weights.
- Evaluate: Evaluate the model's performance using validation and test datasets.
- Download Data: Retrieve the articlesdataset.csv from the database.
- Unprocessed Data: The raw dataset is saved as articlesdataset.csv.
- Clean + Preprocess: Data cleaning and preprocessing steps are performed.
- Filtered and Processed Data: The cleaned data is saved as new_unlabelled_articles.csv.
- Classify:
- Imports the model weights.
- Classifies the articles to generate predictions.
- Output: Save the updated CSV file with predicted labels for new articles.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.
If you have any questions or feedback, please contact one of the following contacts:
- Julia Antonioli - [email protected]
- Kuba Bialczyk - [email protected]
- Nicolò Mazzoleni - [email protected]
- Francisco Gomes - [email protected]
- Martim Esteves - [email protected]