-
Notifications
You must be signed in to change notification settings - Fork 9
Reproducing Paper Results
📋 This file follows the template for releasing ML research code from papers with code
This repository is the official implementation of placeholder. See a graphical overview of our framework below: We propose Yet Another ICU Benchmark. It was designed to address the issues of reproduciblity and provide a unified interface to develop clinical prediction models for the ICU. An experiment in YAIB consists of four steps:
- Defining clinical concepts from the raw data.
- Extracting the patient cohort and specifying the prediction task.
- Preprocessing and feature generation.
- Training and evaluation of the ML model.
YAIB can be installed using conda or pip. Below you will find the three CLI commands to install YAIB using conda. The
The first command will install an environment based on Python 3.10 (currently). This should work on x86 hardware.
conda env update -f environment.yml
We then activate the environment and install a package called icu-benchmarks
, after which YAIB should be operational.
conda activate yaib
pip install -e .
To get the datasets for this paper, please see the YAIB-cohorts repository and the page on the YAIB wiki. You will need to get access to the ICU datasets that you want to run by following a credentialing procedure.
The easiest method to train the models in the paper is to run these commands from the directory root:
wandb sweep --verbose experiments/benchmark_classification.yml
wandb sweep --verbose experiments/benchmark_regression.yml
This will create two hyperparameter sweeps for WandB for the classification and regression tasks. This configuration will train all the models in the paper. You can then run the following command to train the models:
wandb agent <sweep_id>
Tip: You can choose to run each of the configurations on a SLURM cluster instance by
wandb agent --count 1 <sweep_id>
The authors of MIMIC-III and eICU have made a small demo dataset available to demonstrate their use. They can be found on Physionet: MIMIC-III Clinical Database Demo and eICU Collaborative Research Database Demo. These datasets are published under the Open Data Commons Open Database License v1.0 and can be used without credentialing procedure. For each of our currently supported task endpoints, we have created demo cohorts that are processed solely from these datasets. This is, to the best of our knowledge, in compliance with the license and the respective dataset author's instructions. We strongly recommend completing a human subject research training to ensure you properly handle human subject research data.
You can run the following command to train models for the included demo (MIMIC-III and eICU) task cohorts:
wandb sweep --verbose experiments/demo_benchmark_classification.yml
wandb sweep --verbose experiments/demo_benchmark_regression.yml
Use the command above to create a sweep and run this sweep.
Evaluation will happen automatically after running this command. Additionally, YAIB will generate extensive log files and
model files. The logging location is specified within the .yml
files. We recommend using the wandb
web-interface to inspect
the results (see your personal WandB project.
You can download pretrained models here: YAIB-models GitHub repository. YAIB has built-in functionality to evaluate these models. See the below command for an example:
icu-benchmarks evaluate \
-d demo_data/mortality24/eicu_demo \
-n eicu_demo \
-t BinaryClassification \
-tn Mortality24 \
-m LGBMClassifier \
--generate_cache \
--load_cache \
-s 2222 \
-l ../yaib_logs \
-sn mimic \
--source-dir ../yaib_logs/mimic_demo/Mortality24/LGBMClassifier/2022-12-12T15-24-46/fold_0
The current latest results are shown below. Note that major changes have occurred between the classification and regression task experiments. However, results should be comparable overall. Updated results will be posted in the near future.
This source code is released under the MIT license, included here. We do not own any of the datasets used or included in this repository. The demo datasets have been released under an Open Data Commons Open Database License (ODbL).