-
Notifications
You must be signed in to change notification settings - Fork 22
Understantding orthology scores
Parent page - Interpreting results
For each pair of reference annotation transcript and intersecting genome alignment chain TOGA computes an "orthology score" - a numeric feature, that may be: (1) a number between 0.0 and 1.0, (2) be a number -1.0, and (3) be equal to -2.0
The assigned scores can be found in the ${toga_output_dir}/temp/orthology_scores.tsv
.
The scores are used later to decide what exactly to annotate.
An overview of different values is provided below (in progress).
Assigned using XGBoost model. To be filled. Normal range. 0.0 - minimal score, highly unlikely an ortholog 1.0 - maximal score, most likely an ortholog
By default, 0.5 is used as a threshold to differentiate orthologs from paralogs.
Issue related to negative scores
Spanning chains. -1 in the orthology scores file means that the chain is spanning - has no alignment to the coding part of the gene. For such cases, we cannot compute the full set of features properly, and xgboost model is inapplicable. In 99% of cases, it means that the gene in the respective locus is either missing or is deleted. However, TOGA still tries to annotate such cases.
Processed pseudogenes. They have a very specific set of features so Machine Learning was not used to identify them.