vmrt_annotation_models

Various attempts at automated medical record annotation. Since the corpus of Golden Retriever Lifetime Study data does not include significant label data for named entity recognition, a good approach for annotation, we have to improvise and work with what we have. Here is the rough process:

Reduce textual dataset (EMR) to just proper nouns. This helps to remove clutter/noise.
Assemble study data from dictionaries into categorical dataset. This provides human input paired with low effort labeling.
Train a BERT model for multilabel text classification with the categorical dataset.
Perform inference on input proper noun list.

The output gives semi-reliable prediction of categories within the textual dataset.

Environment

At the moment, the code in this repo is meant for local execution only. The Morris Animal Foundation Data Science Team work on Apple M3 or later machines. The easiest way we found to access the GPUs on these machines is to use venv and mpu. Unfortunately, this prevents containerization for local training. Our steps to set up are as follows:

Initialize the venv: python3 -m venv .venv
Change .venv/bin permissions as needed.
Run .venv/bin/activate
Run .venv/bin/pip install -r requirements.txt
Execute scripts with .venv/bin/python scripts/<script>.py

Process

Roughly the scripts are intended to be run in the following order:

train_categorizor.py to train the model.
pos_extraction.py on EMR textual resources.
inference_categorizor.py to provide categorization of data extracted in #2.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
vmrt_annotation_models		vmrt_annotation_models
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vmrt_annotation_models

Environment

Process

About

Releases

Packages

Languages

morrisanimalfoundation/vmrt_annotation_models

Folders and files

Latest commit

History

Repository files navigation

vmrt_annotation_models

Environment

Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages