v2.1.19.0
Added new workflow: sarscov2_sra_to_genbank
-- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]
Minor changes and fixes to sarscov2_illumina_full
:
- filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
- relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
- allow for merging multiple biosample attributes tsvs together in
sarscov2_illumina_full
[#200] - add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
- greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
- makes filename outputs a bit more organized [#200]
- exposes cleaned_bam_uris text file output for easy SRA submission [#200]
- replace the first several steps with an invocation of
demux_deplete
as a subworkflow to reduce code duplication [#197]
Other minor changes:
sarscov2_lineages
andsarscov2_illumina_full
: rename output variablepangolin_clade
topango_lineage
to stay in line with the nomenclature of the PANGOLIN authors. [#197]- increase default RAM for GATK UG consensus calling in
assemble_refbased
from 7GB to 15GB. [#200] - bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
- bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]