Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bdanek/halo main 2 #207

Draft
wants to merge 63 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
451a6fc
modify eICU parsing for SyntheticFairness task
Apr 22, 2023
673eda7
gender & ethnicity nans to str
BPDanek Apr 23, 2023
c56ce24
upgrade from python 3.8 to 3.11
ycq091044 May 10, 2023
a2a01b1
upgrade from python 3.8 to 3.11
ycq091044 May 10, 2023
92118ca
restrict the urllib3 version
ycq091044 May 10, 2023
ba7e54d
add docs for molerec and mimicextract dataset
ycq091044 May 10, 2023
63096a9
add rst files to outline
ycq091044 May 10, 2023
2761110
fix illformed dependency
ycq091044 May 10, 2023
5d416a1
minor style update
ycq091044 May 10, 2023
266d589
checkin prior to dev
BPDanek May 16, 2023
cf2de5c
aggregator & getbatch function implemented first pass
BPDanek May 31, 2023
4e3353b
check in before converting from samples to batch data
BPDanek Jun 1, 2023
9d1fb56
clean up artifacts
BPDanek Jun 1, 2023
5123432
checking to merge with bdanek/ai4health
BPDanek Jun 2, 2023
a29a60d
fix omop doc typo (#166)
BPDanek Jun 2, 2023
1fddc75
Merge pull request #168 from sunlabuiuc/bdanek/ai4health
BPDanek Jun 6, 2023
cf15c06
complete trainer + adjust eICU
BPDanek Jun 7, 2023
e1a5757
add inter-visit gaps + generate synthetic samples complete
BPDanek Jun 18, 2023
235a40e
most of eval done
BPDanek Jun 18, 2023
e9174c9
compelte evaluator & plotter
BPDanek Jun 19, 2023
121d8a2
evaluation complete
BPDanek Jun 21, 2023
05d7780
checking training results
BPDanek Jun 23, 2023
fc8e332
remove images
BPDanek Jun 24, 2023
ee40cf0
update model config + trainer paths
BPDanek Jun 24, 2023
49b4f1e
update paths
BPDanek Jun 24, 2023
2131fa8
bug
BPDanek Jun 24, 2023
8832230
pself.processor
BPDanek Jun 24, 2023
60e36c4
batch_size from generator in plotter
BPDanek Jun 25, 2023
feb9f02
add event handlers
BPDanek Jul 15, 2023
7fe7ba8
pandrallel ram limit
BPDanek Jul 22, 2023
06fcf25
refactor
BPDanek Jul 22, 2023
4defa35
parse_to_tables with inter-visit-time-gap
Jul 23, 2023
8b5beb5
fix bug in halo <> pyhealth dataset
Jul 23, 2023
0d7b9a9
fix bug
BPDanek Jul 23, 2023
e406e08
checking for training
BPDanek Jul 24, 2023
c00e23a
add docs
BPDanek Jul 27, 2023
cf73dc3
debug with Theodorou
BPDanek Aug 3, 2023
ae9cce2
fix bugs + paths
BPDanek Aug 4, 2023
b24a119
compute inter-visit gap bug
BPDanek Aug 4, 2023
39311f4
Merge pull request #165 from sunlabuiuc/bdanek/halo_main_2
BPDanek Aug 4, 2023
20388b5
update conversion to pyhealth format
BPDanek Aug 5, 2023
0509652
add patient id to tables
BPDanek Aug 5, 2023
cf47108
Minor bug fixes
Aug 10, 2023
6675ed4
More minor fixes
Aug 10, 2023
37f3bed
Merge branch 'bdanek/halo_main_2' into synthetic_data
BPDanek Aug 13, 2023
d4fb3ef
Merge pull request #202 from sunlabuiuc/synthetic_data
BPDanek Aug 13, 2023
d695cd0
Add use histograms to handle continuous values
BPDanek Aug 23, 2023
4094475
automatically compute lab bins
BPDanek Sep 8, 2023
c160212
Lots of changes
Sep 13, 2023
c8632ea
Patient demographic changes
Sep 18, 2023
583cabf
Minor changes
Sep 19, 2023
f5ee1e4
Minor changes
Sep 19, 2023
50803c3
Added handling for new labs we don't support
Sep 19, 2023
0c2e4f6
Everything through dataset conversion
Sep 20, 2023
5cd8c44
Everything through dataset conversion
Sep 20, 2023
3ab78d6
Finished initial HALO pipeline
Sep 20, 2023
27282c5
Added the ability to skip straight to training, added label function …
Oct 2, 2023
8cbe084
Added support for k-fold validation setup
Oct 2, 2023
5377633
Added machinery for fairness evaluations with synthetic data
Oct 2, 2023
050db50
Updated groups, added fairplay baselines
Nov 16, 2023
452740c
Fixed full label function
Nov 16, 2023
d2b7bbf
Fix elderly misspelled as eldery
Nov 16, 2023
b23a2ce
Added fairness metrics
Dec 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Datasets
datasets/pyhealth.datasets.SampleEHRDataset
datasets/pyhealth.datasets.SampleSignalDataset
datasets/pyhealth.datasets.MIMIC3Dataset
datasets/pyhealth.datasets.MIMICExtractDataset
datasets/pyhealth.datasets.MIMIC4Dataset
datasets/pyhealth.datasets.eICUDataset
datasets/pyhealth.datasets.OMOPDataset
Expand Down
15 changes: 15 additions & 0 deletions docs/api/datasets/pyhealth.datasets.MIMICExtractDataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pyhealth.datasets.MIMICExtractDataset
===================================

The open Medical Information Mart for Intensive Care (MIMIC-III) database, refer to `doc <https://mimic.mit.edu/>`_ for more information. We process this database into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.

.. autoclass:: pyhealth.datasets.MIMICExtractDataset
:members:
:undoc-members:
:show-inheritance:






2 changes: 1 addition & 1 deletion docs/api/datasets/pyhealth.datasets.OMOPDataset.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
pyhealth.datasets.OMOPDataset
===================================

We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. We it into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.
We can process any OMOP-CDM formatted database, refer to `doc <https://www.ohdsi.org/data-standardization/the-common-data-model/>`_ for more information. The raw data is processed into well-structured dataset object and give user the **best flexibility and convenience** for supporting modeling and analysis.

.. autoclass:: pyhealth.datasets.OMOPDataset
:members:
Expand Down
1 change: 1 addition & 0 deletions docs/api/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ We implement the following models for supporting multiple healthcare predictive
models/pyhealth.models.GAMENet
models/pyhealth.models.MICRON
models/pyhealth.models.SafeDrug
models/pyhealth.models.MoleRec
models/pyhealth.models.Deepr
models/pyhealth.models.ContraWR
models/pyhealth.models.SparcNet
Expand Down
14 changes: 14 additions & 0 deletions docs/api/models/pyhealth.models.MoleRec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pyhealth.models.MoleRec
===================================

The separate callable MoleRecLayer and the complete MoleRec model.

.. autoclass:: pyhealth.models.MoleRecLayer
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: pyhealth.models.MoleRec
:members:
:undoc-members:
:show-inheritance:
64 changes: 32 additions & 32 deletions docs/log.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,121 +4,121 @@ We track the new development here:

**May 9, 2023**

.. code-block:: bash
.. code-block:: rst

1. add MIMIC-Extract dataset `#136 <https://github.com/sunlabuiuc/PyHealth/pull/136>`_
1. add MIMIC-Extract dataset `#136`
2. add new maintainer members for pyhealth: Junyi Gao and Benjamin Danek

**May 6, 2023**

.. code-block:: bash
.. code-block:: rst

1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset `#148 <https://github.com/sunlabuiuc/PyHealth/pull/148>`_
1. add new parser functions (admissionDx, diagnosisStrings) and prediction tasks for eICU dataset `#148`

**Apr 27, 2023**

.. code-block:: bash
.. code-block:: rst

1. add MoleRec model (WWW'23) for drug recommendation `#122 <https://github.com/sunlabuiuc/PyHealth/pull/122>`_
1. add MoleRec model (WWW'23) for drug recommendation `#122`

**Apr 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. fix bugs in GRASP model `#141 <https://github.com/sunlabuiuc/PyHealth/pull/141>`_
2. add pandas install <2 constraints `#135 <https://github.com/sunlabuiuc/PyHealth/pull/135>`_
3. add hcpcsevents table process in MIMIC4 dataset `#134 <https://github.com/sunlabuiuc/PyHealth/pull/134>`_
1. fix bugs in GRASP model `#141`
2. add pandas install <2 constraints `#135`
3. add hcpcsevents table process in MIMIC4 dataset `#134`

**Apr 10, 2023**

.. code-block:: bash
.. code-block:: rst

1. fix Ambiguous datetime usage in eICU (https://github.com/sunlabuiuc/PyHealth/pull/132)

**Mar 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the entire uncertainty quantification module (https://github.com/sunlabuiuc/PyHealth/pull/111)

**Feb 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. add 6 EHR predictiom model: Adacare, Concare, Stagenet, TCN, Grasp, Agent

**Feb 24, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for omop dataset
2. add github action triggered manually, check #104
2. add github action triggered manually, check `#104`

**Feb 19, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for eicu dataset
2. add ISRUC dataset (and task function) for signal learning

**Feb 12, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest for mimiciii, mimiciv
2. add SHHS datasets for sleep staging task
3. add SparcNet model for signal classification task

**Feb 08, 2023**

.. code-block:: bash
.. code-block:: rst

1. complete the biosignal data support, add ContraWR [1] model for general purpose biosignal classification task ([1] Yang, Chaoqi, Danica Xiao, M. Brandon Westover, and Jimeng Sun.
"Self-supervised eeg representation learning for automatic sleep staging."
arXiv preprint arXiv:2110.15278 (2021).)

**Feb 07, 2023**

.. code-block:: bash
.. code-block:: rst

1. Support signal dataset processing and split: add SampleSignalDataset, BaseSignalDataset. Use SleepEDFcassette dataset as the first signal dataset. Use example/sleep_staging_sleepEDF_contrawr.py
2. rename the dataset/ parts: previous BaseDataset becomes BaseEHRDataset and SampleDatast becomes SampleEHRDataset. Right now, BaseDataset will be inherited by BaseEHRDataset and BaseSignalDataset. SampleBaseDataset will be inherited by SampleEHRDataset and SampleSignalDataset.

**Feb 06, 2023**

.. code-block:: bash
.. code-block:: rst

1. improve readme style
2. add the pyhealth live 06 and 07 link to pyhealth live

**Feb 01, 2023**

.. code-block:: bash
.. code-block:: rst

1. add unittest of PyHealth MedCode and Tokenizer

**Jan 26, 2023**

.. code-block:: bash
.. code-block:: rst

1. accelerate MIMIC-IV, eICU and OMOP data loading by using multiprocessing (pandarallel)

**Jan 25, 2023**

.. code-block:: bash
.. code-block:: rst

1. accelerate the MIMIC-III data loading process by using multiprocessing (pandarallel)

**Jan 24, 2023**

.. code-block:: bash
.. code-block:: rst

1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue #71.
1. Fix the code typo in pyhealth/tasks/drug_recommendation.py for issue `#71`.
2. update the pyhealth live schedule

**Jan 22, 2023**

.. code-block:: bash
.. code-block:: rst

1. Fix the list of list of vector problem in RNN, Transformer, RETAIN, and CNN
2. Add initialization examples for RNN, Transformer, RETAIN, CNN, and Deepr
Expand All @@ -128,42 +128,42 @@ We track the new development here:

**Jan 21, 2023**

.. code-block:: bash
.. code-block:: rst

1. Added a new model, Deepr (models.Deepr)

**Jan 20, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the pyhealth live 05
2. add slack channel invitation in pyhealth live page

**Jan 13, 2023**

.. code-block:: bash
.. code-block:: rst

1. add the pyhealth live 03 and 04 video link to the nagivation
2. add future pyhealth live schedule

**Jan 8, 2023**

.. code-block:: bash
.. code-block:: rst

1. Changed BaseModel.add_feature_transform_layer in models/base_model.py so that it accepts special_tokens if necessary
2. fix an int/float bug in dataset checking (transform int to float and then process them uniformly)

**Dec 26, 2022**

.. code-block:: bash
.. code-block:: rst

1. add examples to pyhealth.data, pyhealth.datasets
2. improve jupyter notebook tutorials 0, 1, 2


**Dec 21, 2022**

.. code-block:: bash
.. code-block:: rst

1. add the development logs to the navigation
2. add the pyhealth live schedule to the nagivation
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Sphinx==5.2.3
sphinx-automodapi>
sphinx-automodapi
sphinx-autodoc-annotation
sphinx_last_updated_by_git
sphinxcontrib-spelling
Expand Down
21 changes: 20 additions & 1 deletion pyhealth/datasets/base_ehr_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,25 @@ def __init__(
logger.debug(f"Saved {self.dataset_name} base dataset to {self.filepath}")
save_pickle(self.patients, self.filepath)

self.patient_ids = list(self.patients.keys()) # precompute for use in __getitem__

def __len__(self):
return len(self.patients)

"""
access dataset like dataset[10], or convert dataset to list via list(dataset)
"""
def __getitem__(self, index: slice):
if (isinstance(index, slice)):
patient_ids = self.patient_ids[index]
res = []
for p in patient_ids:
res.append (self.patients[p])
return res
else:
patient_id = self.patient_ids[index]
return self.patients[patient_id]

def _load_code_mapping_tools(self) -> Dict[str, CrossMap]:
"""Helper function which loads code mapping tools CrossMap for code mapping.

Expand Down Expand Up @@ -172,7 +191,7 @@ def parse_tables(self) -> Dict[str, Patient]:
Returns:
A dict mapping patient_id to `Patient` object.
"""
pandarallel.initialize(progress_bar=False)
pandarallel.initialize(progress_bar=False)#, shm_size_mb=10000)

# patients is a dict of Patient objects indexed by patient_id
patients: Dict[str, Patient] = dict()
Expand Down
Loading