-
Notifications
You must be signed in to change notification settings - Fork 9
Generating Cohorts
Please clone the YAIB-cohorts repository.
The following instructions assume that the YAIB-cohorts/R is the current working directory in your R session.
All data extractions were run using R 4.2.2 on an Apple M1 Max with Ventura 13.2.1. An renv
lock file was created to install all necessary package dependencies. To recreate the environment, start a Terminal and run:
Rscript setup_env.R
In order to access the full datasets through R, you need to download them and make them available to ricu
. Please follow the instructions given by the ricu
package: ?ricu::import_src
.
For quick experimentation, ricu
comes with two demo datasets: mimic.demo
and eicu.demo
. These are small, openly available subsets of mimic and eicu that allow for easy prototyping. They should have been installed by renv
. If they aren't, please see the respective Github pages here and here.
Once you have imported the datasets, make sure to set the right data path in .Rprofile file in this directory:
Sys.setenv(RICU_DATA_PATH = "/path/to/your/ricu/data/folder")
This repository currently allows for the extraction of five prediction tasks:
Classification:
- ICU mortality after 24 hours: mortality.R
- Acute Kidney Injury within the next 6 hours: aki.R
- Sepsis within the next 6 hours: sepsis.R
Regression:
- Kidney function on the second day of ICU admission: kidney_function.R
- Remaining length of stay: los.R
All five tasks rely on a shared data cleaning provided in base_cohort.R, which defines and stores a subset of patients in each dataset with sufficient data quality. base_cohort.R therefore needs to be called before any of the task-specific cohorts can be generated.
Rscript "base_cohort.R" --src mimic_demo
Once base_cohort.R was run, data for a single task like mortality from a single dataset can be extracted via:
Rscript "mortality.R" --src mimic_demo
where mortality.R
should be replaced with the task file of interest and mimic_demo
with the database of interest (one of mimic_demo
, eicu_demo
, aumc
, hirid
, eicu
, mimic
, miiv
). Data can be extracted from all datasets simultaneously via bash gen_cohort.sh mortality
.
The output directory for the extracted data can be set in ../config.yaml
The following instructions assume that the YAIB-cohorts/Python folder is the current working directory in your Python session.
An renv
lock file was created to install all necessary package dependencies. To recreate the environment, start a Terminal and run:
python setup_env.py
In order to access the full datasets through R, you need to download them and make them available to ricu
. Please follow the instructions given by the ricu
package: ?ricu::import_src
. Currently, Python code to perform this step is in development and will be uploaded soon.
For quick experimentation, ricu
comes with two demo datasets: mimic.demo
and eicu.demo
. These are small, openly available subsets of mimic and eicu that allow for easy prototyping. They should have been installed by renv
. If they aren't, please see the respective Github pages here and here.
Once you have imported the datasets, make sure to set the right data path in .Rprofile file in this directory:
Sys.setenv(RICU_DATA_PATH = "/path/to/your/ricu/data/folder")
This repository currently allows for the extraction of five prediction tasks:
Classification:
- ICU mortality after 24 hours: mortality.py
- Acute Kidney Injury within the next 6 hours: aki.py
- Sepsis within the next 6 hours: sepsis.py
Regression:
- Kidney function on the second day of ICU admission: kidney_function.py
- Remaining length of stay: los.py
Data for a single task like mortality from a single dataset can be extracted via:
python mortality.py --src mimic_demo
where mortality.py
should be replaced with the task file of interest and mimic_demo
with the database of interest (one of mimic_demo
, eicu_demo
, aumc
, hirid
, eicu
, mimic
, miiv
).
The output directory for the extracted data can be set in ../config.yaml