Assign quality metrics to each genotype as they are loaded #5

carolyncaron · 2017-03-09T22:32:03Z

Since the loader is already saving meta-data for each genotype call in the VCF format, it would be an extremely useful feature to allow the user to set thresholds for what constitutes a good/bad quality call, and then allow that information to be stored directly into the database, also as meta-data. For our purposes, it is more efficient to calculate and store quality metrics than to perform these calculations on the front-end every time user requests for genotype calls through a web interface occurs.

For example,

The user specifies they want a quality of metric of "NDQR" to be saved as meta-data
The user also provides thresholds, let's say lower and upper thresholds, to categorize each call as "bad", "acceptable" or "excellent" quality. These thresholds are set based on meta-data that is already present in the genotype file. In this case, the user may specify a lower-threshold as having a read depth (DP) of 5 and allele depth (AD) of 4, whereas an upper threshold requires a read depth of 50 and allele depth of 45.

We would also like to extend this to all file formats and not just VCF, by allowing an optional column within the legacy format, and additional parsing capabilities within the genotype call column of the matrix format for key-value pairs. This provides the user the most flexibility to specify quality in whatever method they want based on their data (for example, if they happen to know read depth or have a percentage-based quality score), regardless of file format.

For reference, this is our (myself and @laceysanderson) thought process about this on the whiteboard:

carolyncaron added the enhancement label Mar 9, 2017

carolyncaron self-assigned this Mar 9, 2017

carolyncaron mentioned this issue Apr 28, 2017

Improve flexibility of Genotype Matrix and Flat-file formats #10

Open

cornellyujy mentioned this issue Feb 21, 2022

PDOException: SQLSTATE[23502]: Not null violation: 7 ERROR: null value in column "type_id" violates not-null constraint #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign quality metrics to each genotype as they are loaded #5

Assign quality metrics to each genotype as they are loaded #5

carolyncaron commented Mar 9, 2017

Assign quality metrics to each genotype as they are loaded #5

Assign quality metrics to each genotype as they are loaded #5

Comments

carolyncaron commented Mar 9, 2017