Skip to content

adding extinct taxa to the synthetic tree

Mark T. Holder edited this page Feb 13, 2019 · 10 revisions

Versions of the tree <= 10.4 have omitted extinct taxa. This is basically a hack to deal with the fact that many of these taxa are very crudely placed in the taxonomic hierarchy and not included in any of our curated phylogenetic studies. Including them causes phenomena such as the Fungi node in the tree becoming un-browse-able because of having too many children (ideally, we'd just have the browser view filter these taxa out, but we have not implemented that yet).

MTH has (in early Feb, 2019) implemented an (still not deployed nor thoroughly tested) work-around that entails:

  1. processing OTT to add the incertae_sedis flag to every extinct taxon (as well as adding the extinct flag to non-tips whose descendants are all flagged as extinct or extinct_inherited).

  2. pointing the propinquity configuration to the modified OTT

  3. removing extinct and extinct_inherited from the propinquity config's cleaning_flags property. Those flags should remain in additional_regrafting_flags so that extinct taxa that are not in any input phylogeny are not added to to the supertree - this avoids the huge number of Fungi children.

  4. rebuilding the tree

This seems to add 300 leaves (see below) because we don't have many trees with fossil taxa.

We could change the logic in propinquity/otcetera tools to treat extinct taxa as incertae sedis, but that feels like a bit of a hack because the concept of "extinct" is certainly distinct from "incertae sedis". Thus, I'd prefer for the treatment of fossil taxa as incertae sedis to be "shallow" in the software architecture. Hopefully, we could make the taxonomy richer such that those fossil taxa which are incertae sedis are labelled as such. To that end, I've tried to attack the problem by pre-processing the taxonomy.

Specifics:

  1. propinquity fossil_taxa branch on mtholder fork: https://github.com/mtholder/propinquity/tree/fossil_taxa
  2. otcetera extinct-to-incert-sed branch on mtholder fork: https://github.com/mtholder/otcetera/tree/extinct-to-incert-sed
  3. peyotl python3 branch on mtholder fork: https://github.com/mtholder/peyotl/tree/python3 Hopefully all of the code involved would work on Python2.7 too, but I have not tested that.
  4. build otcetera
  5. create virtualenv using python 3.6 (or greater probably works)
  6. run:
otc-taxonomy-parser ott3.0draft6 -E --write-taxonomy=ott3.0.6-extinct-mod 2>err-extinction-mod-3.0.6.txt

in the parent of the OTT dir (here the parent of ott3.0draft6) to create the modified OTT taxonomy flags

  1. point ott3.0.6-extinct-mod using ott = %(home)s/ott/ott3.0.6-extinct-mod in ~/.opentree

  2. run propinquity, and don't forget to cross your fingers!

issues

Spot checks indicates that we'd added the 300 tips shown below. Unfortunately, we end up with a weird result in Homo sapiens as a species because (for reasons that are not clear to me) H. sapiens sapiens is hidden (so the tree just ends up containing Denisovans and Neanderthals as subsp of Homo). https://tree.opentreeoflife.org/taxonomy/browse?id=770315 I suspect that the reason behind hiding the subspecies is the fact that when extinct taxa were pruned humans would just be a monotypic taxon, so we should probably not show the subspecies name in that context.

This issue can probably be dealt with simply by removing the hidden flag from H. sapiens sapiens whenever we build with extinct taxa included. (or have a general mechanism for hiding the sole subspecies for a monotypic species).

The new tips:

6523, 20881, 45812, 45818, 81069, 84218, 102587, 106258, 124432, 196162, 200067,
208456, 211375, 220186, 222067, 271376, 303038, 306515, 365642, 370488, 370492,
370493, 372585, 374222, 437198, 447620, 447653, 459222, 465032, 466809, 469451,
534480, 558503, 564710, 567111, 576651, 576657, 587772, 588438, 607972, 621571,
623176, 625192, 645879, 653155, 707061, 727203, 754373, 816657, 816660, 816665,
816669, 840265, 869089, 933436, 937214, 964061, 964908, 964911, 982349, 1001940,
1009608, 1021848, 1036062, 1066976, 1083365, 3600100, 3600110, 3600120, 3600124,
3600125, 3600127, 3600128, 3600129, 3600131, 3600825, 3607245, 3607484, 3607521,
3607522, 3607676, 3607796, 3610308, 3610315, 3612189, 3612191, 3612195, 3612196,
3612203, 3612205, 3612207, 3612210, 3612259, 3612262, 3612266, 3612406, 3612408,
3612420, 3612428, 3612433, 3612436, 3612500, 3612501, 3612502, 3612503, 3612507,
3612509, 3612510, 3612516, 3612519, 3612521, 3612524, 3612525, 3612529, 3612533,
3612535, 3612536, 3612538, 3612539, 3612541, 3612543, 3612544, 3612547, 3612550,
3612554, 3612558, 3612559, 3612561, 3612562, 3612564, 3612567, 3612569, 3612571,
3612574, 3612579, 3612580, 3612584, 3612586, 3612587, 3612588, 3612589, 3612591,
3612592, 3612594, 3612595, 3612596, 3612597, 3612599, 3612600, 3612601, 3612603,
3612605, 3612606, 3612608, 3612609, 3612610, 3612611, 3612612, 3612613, 3612614,
3612615, 3612616, 3612617, 3612618, 3612619, 3612620, 3612621, 3612624, 3612625,
3612626, 3612628, 3612629, 3612631, 3612632, 3612633, 3612634, 3612635, 3614203,
3614207, 3615450, 3615459, 3615461, 3616017, 3616019, 3616020, 3617145, 3636488,
3636492, 3636495, 3676862, 3676865, 3677021, 4117716, 4117718, 4117748, 4117981,
4117983, 4117984, 4117986, 4117987, 4117988, 4117990, 4117994, 4117996, 4118000,
4118004, 4118005, 4118007, 4118010, 4118012, 4118013, 4118749, 4118794, 4119380,
4119411, 4119429, 4119560, 4119733, 4124528, 4124686, 4125739, 4125746, 4125764,
4125794, 4125820, 4125835, 4126044, 4126058, 4126060, 4126066, 4126085, 4130813,
4130815, 4130817, 4130828, 4130831, 4130832, 4130833, 4130835, 4130836, 4130848,
4941006, 4941266, 4941433, 4941594, 4941696, 4941850, 4941925, 4941926, 4941927,
4941929, 4941930, 4942030, 4942032, 4942359, 4942380, 4942409, 4942412, 4942414,
4942417, 4942432, 4942441, 4942444, 4942547, 4942565, 4942579, 4942613, 4943497,
4944931, 4945957, 4946043, 4949707, 5093185, 5668910, 5773930, 5782046, 5800006,
5833975, 5839494, 5839497, 5925662, 5936119, 5936581, 6140364, 6142887, 6145836,
6145840, 6145853, 6145860, 6145868, 6145876, 6145894, 6145898, 6145900, 6145904,
6146017, 6146167, 6151092, 6157964, 6157997