- Make constellation determination more robust with partial, case-insensitive matching
- Correct H3 motif by adding missing site (159)
- Fix fetch-unclassified-swine.rq to include GISAID strains
- Fix dependency handling
My final release
- Refactor with mypy
- Many bug fixes
- Fix yet another bug
- Fix bug in
prep unpublished
- Add version argument
- Make pulling of tag data optional
- Write failed genbank parses to a log file
- Only skip if cached genbank ttl file has size greater than 0
- Bug fixes
- Extended documentation in README
- Do not include gisaid data by default (replace
--no-gisaid
flag inoctofludb pull
with--include-gisaid
) - Fix read error in
octofludb prep fasta
- Remove the
octofludb make
subcommand- moved
octofludb make masterlist
tooctofludb report masterlist
- removed
const
andsubtypes
, just run SPARQL queries instead
- moved
- Add motif generation to
pull
- Add options to limit what data is updated with
pull
- Add deletion commands
- Bug fixes in octoFLU wrapper
- Move imports to location of use (speeds usage mesages)
- Remove unused
clean
command - Bug fixes
- Remove
script
folder and all bash scripting - Add
pull
subcommand for updating the database - Add
classify
subcommand as a wrapper around octoFLU - Add a config file that contains links to all data and the octoFLU reference
- Fix subtype generation
- Fix bugs in init
- Add
report monthly --context
to pull sequences for comparisons - Add support for the the SPARQL CONSTRUCT command
- Do not generate "unknown" triples
- Add monthly report generator (used for WGS selections)
- Standardize subtype handling (see runtest
get_subtypes
tests) - Replace "VTX98" with "LAIV" in internal gene parser
- Add header to
octofludb make const
output - Add handlers for working with unpublished data
- Convert internal gene clades to uppercase
- Add options for updating a certain number of months to
update_gb
- Add maxyear option to
update_gb
- Fix bug in
update_gb
that prevented updating of pre-2000 data - Fix handling of segment subtypes
- Determine strain subtype using octoFLU info
- in
update_gb
- work backwards through months, not just years
-
new subcommands:
update_gb
: add missing genbank entriesconst
: generate constellations for all swine strainsmasterlist
: generate the A0 masterlist used in NADC quarterly/annual reports and octoflushow
-
cleaner strain name parsing:
- require two forward slashes
- remove parenthesis/bracket terms, for example: "A/wherever/2020 (H1N1)" --> "A/whereever/2020"
- replace space with underscores, for example: "A/South Dakota/2020" --> ""A/South_Dakota/2020""
-
improved data extraction from genbank records
- link parental strains to genbank segment records
- link strain info to the parental strain, including:
- host - with new cleaning
- country - with new cleaning
- A0 numbers - for USA strains
- states - for USA strains
- collection date - as string literal
- fix incorrect (s, length, locus) link
- convert
create_date
from string literal to date - convert
update_date
from string literal to date - convert
length
from string literal to integer
- Update patterns for global clades
- allow "Other-Equine" and such
- allow lowercase letters after "3.XXXX."
- Add first recipe - a subcommand for getting all constellations
- Allow the constellation MIXED and allow X for unknown clades
- Change default repo name to "octofludb"
- Replace docopt with argparse
- Renamed all
load_*
commands tomk_*
. The commands are not "loading" data into the database but only creating turtle files - Change name to octofludb
- Add query and upload subcommands
- Remove
sameAs
relationship betweenStrain
tokens. This relationship was equating strain names (e.g.A/Michigan/288/2019
) with epiflu isolate ids (e.g.,EPI_ISL_381463
). However, one strain name may be shared by multiple epiflu isloate ids, so the sameAs relationship is incorrect.
Everything up to the point where the name changed from d79 to octofludb