Skip to content

Getting nexons newicks for sets of trees in a clade of interest

Stephen Smith edited this page Oct 11, 2015 · 7 revisions

This assumes that you have the python library peyotl installed. You can check out the other tutorials if you want to see how to install and work with this library.

If we work off the last tutorial (here), we may want to get the nexson files associated with the studies that intersect with a particular clade of interest.

##Getting the nexson file## You can get the nexson file for each of the studies like so

from peyotl.api import APIWrapper
from peyotl.nexson_syntax import create_content_spec, get_ot_study_info_from_nexml, PhyloSchema

a = APIWrapper()
oti = a.oti
ps = a.phylesystem_api
b = oti.find_trees(ottTaxonName='Lonicera')
tree_names = []
for i in b:
    for j in i["matched_trees"]:
        tr = j["nexson_id"]
        st = "_".join(j["oti_tree_id"].split("_")[0:2])
        nexson = ps.get(st)

Let's go through each line to see what is happening here.

The first couple lines are just bringing in what we need from peyotl

from peyotl.api import APIWrapper
from peyotl.nexson_syntax import create_content_spec, get_ot_study_info_from_nexml, PhyloSchema

The next lines give us access to the api from the oti service, which lets us search for studies and other things and phylesystem service which serves the nexson files.

a = APIWrapper()
oti = a.oti
ps = a.phylesystem_api

Then the bits before the for loops just do the search we did in the last tutorial where we get the studies that overlap with the taxon name "Lonicera".

b = oti.find_trees(ottTaxonName='Lonicera')

Now for the for loops

for i in b:
    for j in i["matched_trees"]:
        tr = j["nexson_id"]
        st = "_".join(j["oti_tree_id"].split("_")[0:2])
        nexson = ps.get(st)

The first loop iterates through b, the list of studies with i being a dictionary. The second loop iterates through the list of matched trees in the dictionary. tr will look like this in the first iteration of the loop

tree532

We wont really need it right now, but it lets you get the tree id which we will use in a bit. Instead, to get the nexson, we use the st variable. st looks like

pg_424

With this study id, we can now get the nexson from the phylesystem api. This is simply done with

nexson = ps.get(st)

The variable nexson can be written to a file if you like, for example like this

nexson = ps.get(st)
fl = open(st, "w")
fl.write(nexson)
fl.close()

##Getting a newick from the nexson## The nexson may not be very useful or may require conversion to do most things. peyotl comes with some nice scripts to do this in the scripts/nexsons/ directory. We can also extract the necessary bits of that and insert into our scripts if we want to have some custom utility or modification.

So we will extract the bits we need and make a function in our file that will write the trees we want to a file that will be called tree_id.tre.

import codecs

def convert_nexson_newick(inp, tid):
    outfn = tid+".tre"
    src_schema = None
    out = codecs.open(outfn, mode='w', encoding='utf-8')
    otu_label = 'ottid' # originallabel, ottid, otttaxonname
    blob = inp
    schema = create_content_spec(content='tree', content_id=tid, format='newick', otu_label=otu_label)
    try:
        schema.convert(src=blob, serialize=True, output_dest=out, src_schema=src_schema)
    except KeyError:
        if 'nexml' not in blob and 'nex:nexml' not in blob:
            blob = blob['data']
            schema.convert(src=blob, serialize=True, output_dest=out, src_schema=src_schema)
        else:
            raise
    return

There are a few things here that we want to see in a bit more detail. So, for example, the tip names can be just the ottids, the original label that came in with the tree, or the ottid with the taxon name. You can change these with otu_label. This is setup for just ottid. You can also change it to be nexus instead of newick with just changing the format on the schema = line to be nexus instead of newick.

With adding this little bit, we can just change our file to say

if __name__ == "__main__":
    a = APIWrapper()
    oti = a.oti
    ps = a.phylesystem_api
    b = oti.find_trees(ottTaxonName='Lonicera')
    for i in b:
        for j in i["matched_trees"]:
            tr = j["nexson_id"]
            st = "_".join(j["oti_tree_id"].split("_")[0:2])
            nexson = ps.get(st)
            convert_nexson_newick(nexson, tr)

Now you should have two files called tree531.tre and tree532.tre in your directory which should be the newicks for the trees in these studies.

back to the Tutorials

Clone this wiki locally