-
Notifications
You must be signed in to change notification settings - Fork 1
An adaptable, user-dependent, and precise tool for free-text normalization and processing.
License
sebastianduesing/adp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains code to normalize free-text data using a three-stage method that applies standardization at the level of characters, words, and phrases. Character standardization is most generic; phrase standardization is most dataset-specific. The normalization rules used by ADP are designed to be easily adapted to the needs of the dataset to be normalized. Included in this repository are two datasets containing age and data-location free-text data from the Immune Epitope Database (IEDB). The scripts char_normalizer.py, word_normalizer.py, and phrase_normalizer.py perform the core normalization functions in ADP, but the scripts/ directory contains several other scripts that provide accessory functions and perform data collection and analysis on the outputs of the normalization process. More coming soon.
About
An adaptable, user-dependent, and precise tool for free-text normalization and processing.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published