-
Notifications
You must be signed in to change notification settings - Fork 26
Getting nexons newicks for sets of trees in a clade of interest
This assumes that you have the python library peyotl installed. You can check out the other tutorials if you want to see how to install and work with this library.
If we work off the last tutorial (here), we may want to get the nexson files associated with the studies that intersect with a particular clade of interest.
##Getting the nexson file## You can get the nexson file for each of the studies like so
from peyotl.api import APIWrapper
from peyotl.nexson_syntax import create_content_spec, get_ot_study_info_from_nexml, PhyloSchema
a = APIWrapper()
oti = a.oti
ps = a.phylesystem_api
b = oti.find_trees(ottTaxonName='Lonicera')
tree_names = []
for i in b:
for j in i["matched_trees"]:
tr = j["nexson_id"]
st = "_".join(j["oti_tree_id"].split("_")[0:2])
nexson = ps.get(st)
Let's go through each line to see what is happening here.
The first couple lines are just bringing in what we need from peyotl
from peyotl.api import APIWrapper
from peyotl.nexson_syntax import create_content_spec, get_ot_study_info_from_nexml, PhyloSchema
The next lines give us access to the api from the oti service, which lets us search for studies and other things and phylesystem service which serves the nexson files.
a = APIWrapper()
oti = a.oti
ps = a.phylesystem_api
Then the bits before the for loops just do the search we did in the last tutorial where we get the studies that overlap with the taxon name "Lonicera".
b = oti.find_trees(ottTaxonName='Lonicera')
Now for the for loops
for i in b:
for j in i["matched_trees"]:
tr = j["nexson_id"]
st = "_".join(j["oti_tree_id"].split("_")[0:2])
nexson = ps.get(st)
The first loop iterates through b, the list of studies with i being a dictionary. The second loop iterates through the list of matched trees in the dictionary. tr will look like this in the first iteration of the loop
tree532
We wont really need it right now, but it lets you get the tree id which we will use in a bit. Instead, to get the nexson, we use the st variable. st looks like
pg_424
With this study id, we can now get the nexson from the phylesystem api. This is simply done with
nexson = ps.get(st)
The variable nexson can be written to a file if you like, for example like this
nexson = ps.get(st)
fl = open(st, "w")
fl.write(nexson)
fl.close()
##Getting a newick from the nexson##
The nexson may not be very useful or may require conversion to do most things. peyotl comes with some nice scripts to do this in the scripts/nexsons/
directory. We can also extract the necessary bits of that and insert into our scripts if we want to have some custom utility or modification.
So we will extract the bits we need and make a function in our file that will write the trees we want to a file that will be called tree_id.tre
.
import codecs
def convert_nexson_newick(inp, tid):
outfn = tid+".tre"
src_schema = None
out = codecs.open(outfn, mode='w', encoding='utf-8')
otu_label = 'ottid' # originallabel, ottid, otttaxonname
blob = inp
schema = create_content_spec(content='tree', content_id=tid, format='newick', otu_label=otu_label)
try:
schema.convert(src=blob, serialize=True, output_dest=out, src_schema=src_schema)
except KeyError:
if 'nexml' not in blob and 'nex:nexml' not in blob:
blob = blob['data']
schema.convert(src=blob, serialize=True, output_dest=out, src_schema=src_schema)
else:
raise
return
There are a few things here that we want to see in a bit more detail. So, for example, the tip names can be just the ottids, the original label that came in with the tree, or the ottid with the taxon name. You can change these with otu_label. This is setup for just ottid. You can also change it to be nexus instead of newick with just changing the format on the schema =
line to be nexus
instead of newick
.
With adding this little bit, we can just change our file to say
if __name__ == "__main__":
a = APIWrapper()
oti = a.oti
ps = a.phylesystem_api
b = oti.find_trees(ottTaxonName='Lonicera')
for i in b:
for j in i["matched_trees"]:
tr = j["nexson_id"]
st = "_".join(j["oti_tree_id"].split("_")[0:2])
nexson = ps.get(st)
convert_nexson_newick(nexson, tr)
Now you should have two files called tree531.tre
and tree532.tre
in your directory which should be the newicks for the trees in these studies.