Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate between "prototypes" and "instances" #1

Open
mbaudis opened this issue Jun 2, 2020 · 0 comments
Open

Disambiguate between "prototypes" and "instances" #1

mbaudis opened this issue Jun 2, 2020 · 0 comments
Assignees

Comments

@mbaudis
Copy link
Member

mbaudis commented Jun 2, 2020

One of the most confusing points in the definition of schemas for (genomic) variants is the conflation of variant description for "prototypes" (e.g. recurring genomic changes of "equivalent" alteration - think "BRAF V600E" or "CNV region ..."), from single instances / observations. Examples for the missing separation between "observation/variant calling" and interpretation can be seen in the VCF format with the existence of multi-allelic "variants" and allele frequency values but used for the annotation of individual variant calls.

In discussing formats for annotating CNVs and (and possibly additional variant types) for data storage, representation and knowledge resources it will be helpful to have a clear scoping between the different logical types:

  • a variant "observation, "call" or "instance", which just represents the outcome of a technical analysis pipeline w/o any inference from outside data (apart from low-level calibration etc.)
    • population frequencies etc. play no role at this stage
    • "fuzzy" positions for start, end refer to technical uncertainties
  • "prototypes" of variants, e.g. of exact or "equivalent" observations (from n=1 => ++)
    • "fuzzy" positions for start, end here refer to e.g. variations in precise mapping of variants seen as "equivalent"

The proposal - apart from discussions about other parameters relevant for the one or other type - is to use a separate parameter for the branding of the scope here, e.g.:

"representation": "observation"

in contrast to

"representation": "evidence"

Such a model would a) help with some of the design discussions, and b) work nicely in the integration with different types of resources and APIs such as Beacon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants