Releases: mozack/abra2
v2.14
v2.13
- Improved handling of known indel sites in RNA
- Minor SAM spec corrections for STAR bams
- Support reading observed junctions directly from bam input
- Do not attempt to use reads with length of zero (from Lenbok)
- Added --gc option. Skips remapping if eligible regions do not contain at least one contig with an indel or splice junction (experimental)
- Added option to not remap reads with new cigar containing deletion bracketed by introns (experimental)
v2.12
v2.11
v2.10
v2.09
Note: Previous releases were compiled on Centos 6. This release is compiled on Centos 7.
- Don't allow read to map beyond the end of a chromosome
- Default GKL to off. Appears to cause stability issues in some cases. (Re-enable using --gkl)
- Implemented disk backed sort / mate fix. Should address out of memory errors during sort phase.
- Added additional logging around read buffer problems causing potential crashes
- Update edit distance using STAR's nM tag when appropriate.
- Improved handling of exon skipping junctions
- Cadabra: Improved handling of reference bias when calling at repeat locations
Known issues:
- Still seeing a couple of reads with mate not updated properly in WGS test.
- A small number of reads are being dropped in RNA testing.
v2.08
Speed and accuracy are generally improved in this release (particularly for RNA).
- Merge and error correct overlapping reads for use in assembly / contig generation.
- Improvements to assembly triggers. Assembly is much less frequent now.
- Avoid adapter read through triggering uncessary assemblies.
- More sensitive assembly in place (less aggressive pruning / downsampling)
- Improved handling of unannotated exon skipping junctions.
- Default --dist to 1000 for DNA (Improves RAM usage when sorting large files). Recommend using 500000 for RNA to allow movement across large introns.
- Handle STAR's nM tag
- Improved options for Cadabra
- Skip decoy chromosomes / unplaced contigs, etc. by default.
- Identify smallest repeat unit when calculating repeat period
- RNA speed optimization : Limit # contigs in the face of large number of junction permutations
- Improvement to ISPAN annotation for inserts
- Fixed bug where RNA regions where being skipped due to large splice junctions
- Added option to downgrade mapping quality when a read maps equally well to reference and a contig (potentially useful for downstream callers that handle STRs / repeats naively).
- Treat deletion adjacent to intron as a potential alternative intron.
- Avoid religning very noisy reads
v2.07
- Allow remap of unclipped portion of soft clipped reads when entire read does not realign
- Update read mate info properly (
output BAM files should now pass Picard's ValidateSamFile) - Align non-assembled contigs to all junction combos
- Default mapq filter to 20
- Don't allow contig to exceed max buffer size (was causing occasional segmentation faults)
- Added --nosort option. Use to disable final sorting (and mate fixing) step.
- Added --dist option. Use to shrink max distance a read can be moved (also impacts sorting)
v2.06
- Now using more conservative assembly triggers
- Added trigger for more sensitive assembly (enables modest improvement for some longer inserts)
- Accomodate reads spanning multiple indels for observed indel contig generation
- Correction to affine gap penalty calculations in contig alignment
- Contigs now left aligned during semi-global alignment instead of post-hoc
- Ported semi-global aligner to C (for speed improvement)
- Parameterized final BAM compression level (consider using 1 for intermediate files that are to be deleted)
- Parameterize anchor length / mismatches (consider using this along with --cons for amplicon data)
- Parameterize max reads per region
v2.05
-
Enable usage of high quality soft clipping and observed indels to generate putative contigs by default
** Note: parameters around this behavior have changed -
Remap reads equal to min mapq
-
Output unmapped read pairs in final BAM
-
Added deBruijn graph simplification
-
Cadabra - only use primary alignments for calling
-
Various speed optimizations
** More conservative assembly triggers
** Optimization of read to contig mapping
** sparse_hash --> dense_hash
** Introduce smaller chromosome "chunks" to improve parallelization
** Parallelize final BAM sorting/writing in multi-BAM cases
** More selective permuting of regional splice junctions combinations
** Discard low scoring contigs during assembly
** Cadabra - added multi-threading