Deploying a new taxonomy version

Try out a release candidate
Do a release

Try out a release candidate

Update version information

In the Makefile, update WHICH to a new draft, or to a new version draft 1. If updating to a brand new version, also update PREV_WHICH to name the previous released version.

For a new version, remove tax/prev_ott/ to force retrieval of the correct previous version.

rm -rf tax/prev_ott

Update sources if desired

(Optional) In your reference-taxonomy clone:

make refresh-ncbi

(Other updates - GBIF, SILVA, etc. - are manual)

Update additions

Get all additions

make refresh-amendments

Make a draft

make ott

Check the transcript

A lot of information is written to standard output, and also captured in tax/ott/transcript.out. A few standard prefixes indicate the severity of a message:

| Informational. Could be of interest for troubleshooting purposes.
# Debugging.
* A worrying situation, but probably not so serious that it demands attention.
** An error of some kind that should probably be remedied before release.

For example, consider:

** No such taxon: Myrmecia in Microthamniales (gbif)

This comes from an alignment directive (.same) in adjustments.py (as is easily found using grep). Presumably this GBIF taxon existed in a past OTT build but doesn't now due to an update to GBIF. This particular error can be ignored, but ideally it would either be researched to make sure we aren't missing anything (maybe the GBIF taxon still exists and was renamed), or the directive should be commented out.

Check the draft

Open tax/ott/deprecated.tsv and check for a few things:

If it's very large - more than one or two thousand rows - that could signal a problem with the taxonomy inputs or method.
Look at the lines that say id-retired. These represent OTUs in phylesystem that were formerly mapped, but aren't mapped now. There are 22 of these in OTT 2.10 and each one has a story; if all goes well there really is no appropriate node in the new taxonomy to map these ids to, but as often as not these reflect some kind of error, such as a missing synonym. If the number is way more than about 20, that could signal a problem with inputs or method (e.g. Mammals wrongly placed under Magnoliophyta). It is possible to debug these, but it requires knowledge of the smasher code.

Get final OTT ids

If a block of new ids has not (yet) been reserved, the number of new ids required is stored in new_taxa/need_ids.json.

To allocate a range of ids: - AUTOMATION NEEDED

stop the production phylesystem-api webapp (it suffices to stop apache: sudo apache2ctl stop)
clone and/or refresh the amendments-1 repo
manually edit next_ott_id.json (add at least the number in new_taxa/need_ids.json; add another 10% to the number of needed ids just for good measure)
commit and push to github
relaunch the production phylesystem-api webapp using push.sh -c ... phylesystem-api (a new deployment is required in order to refresh the clones that drive the phylesystem-api).
store the id range in file new_taxa/range.json.

After smasher runs and the new ids get used, the id assignments will be stored in file new_taxa/addition-*-*.json.

Generate taxonomy with final OTT ids

make works

If make works generates an addition request (need_ids.json), something is wrong; you'll need to repeat the "get final OTT ids" and "update additions" steps. But this shouldn't happen.

Put taxonomy on files.opentreeoflife.org

This supposes version 2.10 and draft 6; substitute as needed. If the directory ott2.10 doesn't exist on files.opentreeoflife.org, you'll have to create it.

time scp -p tarballs/ott2.10draft6.tgz files:files.opentreeoflife.org/ott/ott2.10/

(about a minute.)

Create taxomachine neo4j database

Recommended: do this on varela.csail.mit.edu. ssh varela to log in to your own account, not the 'opentree' account. The reason to run on varela is that transferring the database file to AWS (devapi, api) is probably faster than transferring from wherever you were thinking of transferring from.

The database creation script (in the taxomachine repository's Makefile) looks for the taxonomy in the ott directory (or symlink) in the taxomachachine repo. The following command sets up an ott directory; run it from a taxomachine clone:

cd {.../}taxomachine
time tar xvf ~opentree/files.opentreeoflife.org/ott/ott2.10/ott2.10draft6.tgz

(Takes about a minute.)

Now create the neo4j graph database for taxomachine.

git pull
time make db

(make db takes about an hour and fifteen minutes on varela.)

Create tarball of the graph db (about 25 minutes):

time make tarball

Transfer database tarball to devapi.opentreeoflife.org

Today's date will be in the name of the tarball file, so use that in an scp command, e.g.

  time scp -p taxomachine-20160827.db.tgz devapi:downloads/

(This took 20 minutes on a recent attempt.)

Extract the database and restart taxomachine services

Back to your local machine now (not varela):

cd ../germinator/deploy
./push.sh -c ../../deployed-systems/development/devapi.config install-db downloads/taxomachine-20160827.db.tgz taxomachine

(about 11 minutes)

Sanity check: inspect using taxonomy browser

Visit:

https://devtree.opentreeoflife.org/taxonomy/browse?id=304358

and explore a little bit, watching out for anything odd-looking.

Reindex oti

./push.sh -c ../../deployed-systems/development/devapi.config install-db downloads/taxomachine-20160827.db.tgz oti
./push.sh -c ../../deployed-systems/development/devapi.config index-db

again substituting today's date for 20160827.

(An hour?)

Configure conflict service and restart it

The conflict service looks in ~opentree/repo/reference-taxonomy/service/ for the taxonomy and synthetic tree. The taxonomy is expected to be called taxonomy and the synthetic tree synth.tre. These can be symbolic links.

Since the taxonomy doesn't otherwise reside on devapi (or api), it's necessary to download it from http://files.opentreeoflife.org/ott/ott2.10/ott2.10.tgz (or wherever), and unpack it. Since it unpacks as ott, do ln -s ott taxonomy.

After the links are in place, restart with

./push.sh ... smasher

Create a new synthetic tree and install it on devapi

At this point it is highly desirable to make a new synthetic tree based on the new taxonomy.

Notes elsewhere.

Flush the web2py cache

Certain pages are cached by web2py. To flush the cache and force new versions to be seen, simply restart apache.

./push.sh -c ../../deployed-systems/development/devapi.config apache

Try it out

Start the synthetic tree browser

https://devtree.opentreeoflife.org/

and check for funniness.

Do a release

Prepare taxonomy release notes

In germinator repo.

Update statistics

Edit the OTT statistics file in the opentree repo, and re-deploy the webapp so that this file is served.

Install everything on api.opentreeoflife.org

The copy and push.sh commands are same as above, but with 'api' substituted for 'devapi'.

Ensure that all taxonomy sources are archived

ssh files mkdir files.opentreeoflife.org/ncbi/ncbi-YYYYMMDD
scp -p feed/ncbi/in/taxdump.tgz files:files.opentreeoflife.org/ncbi/ncbi-YYYYMMDD/ncbi-YYYYMMDD.tgz

Update NCBI_URL in the OTT Makefile to match.

(If the files server moves to Amazon S3, these commands will need to change, as ssh and scp don't work with S3. See https://github.com/OpenTreeOfLife/germinator/pull/127 .)

Commit and push changes to reference-taxonomy repository, and create a tag

Make the new taxonomy current

Change the ott/current symbolic link so that it points to the new taxonomy instead of the old one.

ssh files ln -sf files.opentreeoflife.org/ott/ottX.X files.opentreeoflife.org/ott/current

Rename the draft number from the name of the final version, e.g. change ott2.10draft6.tgz to ott2.10.tgz

Updating the synthetic tree

While the system will work fine with taxonomy version N deployed in taxomachine and oti (or otindex) and a synthetic tree built using taxonomy version N-1 deployed in treemachine, users may occasionally come across certain anomalies that may be confusing. It is therefore a good idea to make a new synthetic tree whenever the taxonomy changes. The procedure for doing so is described here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly