Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 2.32 KB

File metadata and controls

25 lines (17 loc) · 2.32 KB

From bulk to single-cell and spatial data: An AI framework to characterise breast cancer metabolic disruptions across modalities

This repository contains the code and data to reproduce the results presented in the paper “Uncovering breast cancer metabolic landscape through multi-modal machine learning and metabolic modelling"

The framework integrates machine learning with patient-specific metabolic modelling to predict risk for breast cancer patients. The repository contains 3 main folders:

  • Data preprocessing: providing the preprocessing code, including feature selection techniques for transcriptomic and fluxomic data;
  • Metabolic modelling: providing the Matlab code for GSMM to generate patient-specific flux rates (fluxomic data);
  • ML models: providing the Jupyter notebook with the code to run the machine learning (ML) models. The code is reproducible with different number of selected omic features; hence, we provide the ML results for the optimal selected omic features for each data modality and their combinations.
  • An end-to-end tutorial in a Google Colab notebook allowing users to easily analyse clinical and transcriptomic data and investigate significant alternations both at the single-cell and spatial levels.
  • The data used in this study can be downloaded at TCGA website: https://portal.gdc.cancer.gov/. We provide all data used in this study, including raw and preprocessed clinical, raw transcriptomic, and fluxomic data generated by metabolic model (https://figshare.com/articles/dataset/Data/22337722).

    How to run

    The following steps are required to run the code:

  • Python 3.9.x and R version 4.2.x are required, a check on the specification for the used packages (requirement.txt) is required before running the code.
  • Jupyter notebook server is required.
  • Ensure all pip dependencies are installed as listed in requirements.txt.
  • Run through the steps laid out in the notebook in the order of folders (starting with the preprocessing folder, metabolic modelling, to ML models).
  • License

    This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Public License for more details.

    Le Minh Thao Doan - Nov 2024