-
Notifications
You must be signed in to change notification settings - Fork 7
Deploying a new taxonomy version
- Try out a release candidate
- Do a release
In the Makefile, update WHICH to a new draft, or to a new version draft 1. If updating to a brand new version, also update PREV_WHICH to name the previous released version.
For a new version, remove tax/prev_ott/ to force retrieval of the correct previous version.
rm -rf tax/prev_ott
(Optional) In your reference-taxonomy clone:
make refresh-ncbi
(Other updates - GBIF, SILVA, etc. - are manual)
Get all additions
make refresh-amendments
make ott
A lot of information is written to standard output, and also captured in tax/ott/transcript.out. A few standard prefixes indicate the severity of a message:
|
Informational. Could be of interest for troubleshooting purposes.
#
Debugging.
*
A worrying situation, but probably not so serious that it demands attention.
**
An error of some kind that should probably be remedied before release.
For example, consider:
** No such taxon: Myrmecia in Microthamniales (gbif)
This comes from an alignment directive (.same) in adjustments.py (as is easily found using grep). Presumably this GBIF taxon existed in a past OTT build but doesn't now due to an update to GBIF. This particular error can be ignored, but ideally it would either be researched to make sure we aren't missing anything (maybe the GBIF taxon still exists and was renamed), or the directive should be commented out.
Open tax/ott/deprecated.tsv and check for a few things:
- If it's very large - more than one or two thousand rows - that could signal a problem with the taxonomy inputs or method.
- Look at the lines that say
id-retired
. These represent OTUs in phylesystem that were formerly mapped, but aren't mapped now. There are 22 of these in OTT 2.10 and each one has a story; if all goes well there really is no appropriate node in the new taxonomy to map these ids to, but as often as not these reflect some kind of error, such as a missing synonym. If the number is way more than about 20, that could signal a problem with inputs or method (e.g. Mammals wrongly placed under Magnoliophyta). It is possible to debug these, but it requires knowledge of the smasher code.
If a block of new ids has not (yet) been reserved, the number of new ids required is stored in new_taxa/need_ids.json.
To allocate a range of ids: - AUTOMATION NEEDED
- stop the production phylesystem-api webapp (it suffices to stop apache:
sudo apache2ctl stop
) - clone and/or refresh the amendments-1 repo
- manually edit next_ott_id.json (add at least the number in
new_taxa/need_ids.json
; add another 10% to the number of needed ids just for good measure) - commit and push to github
- relaunch the production phylesystem-api webapp using
push.sh -c ... phylesystem-api
(a new deployment is required in order to refresh the clones that drive the phylesystem-api). - store the id range in file
new_taxa/range.json
.
After smasher runs and the new ids get used, the id assignments will be stored in file new_taxa/addition-*-*.json
.
make works
If make works
generates an addition request (need_ids.json), something is wrong; you'll need to repeat the "get final OTT ids" and "update additions" steps. But this shouldn't happen.
This supposes version 2.10 and draft 6; substitute as needed. If the directory ott2.10 doesn't exist on files.opentreeoflife.org, you'll have to create it.
time scp -p tarballs/ott2.10draft6.tgz files:files.opentreeoflife.org/ott/ott2.10/
(about a minute.)
Recommended: do this on varela.csail.mit.edu. ssh varela
to log in to your own account, not the 'opentree' account. The reason to run on varela is that transferring the database file to AWS (devapi, api) is probably faster than transferring from wherever you were thinking of transferring from.
The database creation script (in the taxomachine repository's Makefile) looks for the taxonomy in the ott
directory (or symlink) in the taxomachachine repo. The following command sets up an ott
directory; run it from a taxomachine clone:
cd {.../}taxomachine
time tar xvf ~opentree/files.opentreeoflife.org/ott/ott2.10/ott2.10draft6.tgz
(Takes about a minute.)
Now create the neo4j graph database for taxomachine.
git pull
time make db
(make db
takes about an hour and fifteen minutes on varela.)
Create tarball of the graph db (about 25 minutes):
time make tarball
Today's date will be in the name of the tarball file, so use that in an scp
command, e.g.
time scp -p taxomachine-20160827.db.tgz devapi:downloads/
(This took 20 minutes on a recent attempt.)
Back to your local machine now (not varela):
cd ../germinator/deploy
./push.sh -c ../../deployed-systems/development/devapi.config install-db downloads/taxomachine-20160827.db.tgz taxomachine
(about 11 minutes)
Visit:
https://devtree.opentreeoflife.org/taxonomy/browse?id=304358
and explore a little bit, watching out for anything odd-looking.
./push.sh -c ../../deployed-systems/development/devapi.config install-db downloads/taxomachine-20160827.db.tgz oti
./push.sh -c ../../deployed-systems/development/devapi.config index-db
again substituting today's date for 20160827.
(An hour?)
The conflict service looks in ~opentree/repo/reference-taxonomy/service/ for the taxonomy and synthetic tree. The taxonomy is expected to be called taxonomy
and the synthetic tree synth.tre
. These can be symbolic links.
Since the taxonomy doesn't otherwise reside on devapi (or api), it's necessary to download it from http://files.opentreeoflife.org/ott/ott2.10/ott2.10.tgz (or wherever), and unpack it. Since it unpacks as ott
, do ln -s ott taxonomy
.
After the links are in place, restart with
./push.sh ... smasher
At this point it is highly desirable to make a new synthetic tree based on the new taxonomy.
Notes elsewhere.
Certain pages are cached by web2py. To flush the cache and force new versions to be seen, simply restart apache.
./push.sh -c ../../deployed-systems/development/devapi.config apache
Start the synthetic tree browser
https://devtree.opentreeoflife.org/
and check for funniness.
In germinator repo.
Edit the OTT statistics file in the opentree repo, and re-deploy the webapp so that this file is served.
The copy and push.sh commands are same as above, but with 'api' substituted for 'devapi'.
ssh files mkdir files.opentreeoflife.org/ncbi/ncbi-YYYYMMDD
scp -p feed/ncbi/in/taxdump.tgz files:files.opentreeoflife.org/ncbi/ncbi-YYYYMMDD/ncbi-YYYYMMDD.tgz
Update NCBI_URL in the OTT Makefile to match.
(If the files server moves to Amazon S3, these commands will need to change, as ssh and scp don't work with S3. See https://github.com/OpenTreeOfLife/germinator/pull/127 .)
Change the ott/current symbolic link so that it points to the new taxonomy instead of the old one.
ssh files ln -sf files.opentreeoflife.org/ott/ottX.X files.opentreeoflife.org/ott/current
Rename the draft number from the name of the final version, e.g. change ott2.10draft6.tgz to ott2.10.tgz
While the system will work fine with taxonomy version N deployed in taxomachine and oti (or otindex) and a synthetic tree built using taxonomy version N-1 deployed in treemachine, users may occasionally come across certain anomalies that may be confusing. It is therefore a good idea to make a new synthetic tree whenever the taxonomy changes. The procedure for doing so is described here.