Skip to content

An adaptable, user-dependent, and precise tool for free-text normalization and processing.

License

Notifications You must be signed in to change notification settings

sebastianduesing/adp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains code to normalize free-text data using a three-stage method that
applies standardization at the level of characters, words, and phrases. Character
standardization is most generic; phrase standardization is most dataset-specific. The
normalization rules used by ADP are designed to be easily adapted to the needs of the
dataset to be normalized. Included in this repository are two datasets containing age and
data-location free-text data from the Immune Epitope Database (IEDB).

The scripts char_normalizer.py, word_normalizer.py, and phrase_normalizer.py perform the
core normalization functions in ADP, but the scripts/ directory contains several other
scripts that provide accessory functions and perform data collection and analysis on the
outputs of the normalization process.

More coming soon.

About

An adaptable, user-dependent, and precise tool for free-text normalization and processing.

Resources

License

Stars

Watchers

Forks

Packages

No packages published