-
Notifications
You must be signed in to change notification settings - Fork 107
Comparison with samtools
lomereiter edited this page Aug 16, 2012
·
9 revisions
Feature | sambamba view | samtools view | Notes |
---|---|---|---|
BAM support | Full | Full | |
SAM support | Full | Full | sambamba skips (syntactically) invalid tags and sets invalid fields to default values |
Error messages | Descriptive | Incomplete | Where samtools says just 'truncated file', sambamba prints detailed error message with a description what is wrong with BAM file |
Multithreaded BAM decompression | Yes | No | |
Non-seekable file stream support | Yes | Yes | |
Skipping invalid reads | Optional | No | Sambamba library also includes a module for creating custom validation tools |
Filtering | Powerful | Limited | sambamba view comes with a simple query language for filtering alignments |
JSON output | Yes | No | useful for interacting with scripting languages |
Progressbar | Optional | No |
Feature | sambamba | samtools |
---|---|---|
Indexing | Yes, multithreaded | Yes, single-threaded |
Merging BAM files | Yes, multithreaded decompression and compression | Yes, compression is multithreaded |
Automatic SAM header merging | Yes | No |
Multithreaded BAM file external sort | Yes | Yes |
Flag statistics | Yes, multithreaded | Yes, single-threaded |
(other utilities available in samtools are not implemented in sambamba) |
Here are some benchmarks on two configurations:
- Intel Atom N450 @ 1.66GHz (1 core with hyperthreading), 1GB of RAM
- 2x Intel Xeon E5310 @ 1.60GHz (8 cores without hyperthreading), 8GB of RAM
On both machines, sambamba was built with GDC compiler (which is used for building debian packages), and samtools was built with its default makefile using gcc -02.
Tools were tested on HG00125.chrom20.ILLUMINA.bwa.GBR.low_coverage.20111114.bam (denoted by $FILENAME in command lines), 301MB in size.
Indexing BAM file (empty file cache) | |||||||
---|---|---|---|---|---|---|---|
sambamba index $FILENAME | samtools index $FILENAME | ||||||
Configuration | Time | Memory usage | CPU load | Configuration | Time | Memory usage | CPU load |
Intel Atom N450 | 12.29s | 32MB | 147% | Intel Atom N450 | 13.6s | 1.4MB | 92% |
2x Intel Xeon E5310 | 6.96s | 32MB | 139% | 2x Intel Xeon E5310 | 8.73s | 1.4MB | 93% |
Indexing BAM file (file fully cached into RAM) | |||||||
sambamba index $FILENAME | samtools index $FILENAME | ||||||
Configuration | Time | Memory usage | CPU load | Configuration | Time | Memory usage | CPU load |
Intel Atom N450 | 9.43s | 32MB | 188% | Intel Atom N450 | 12.08s | 1.4MB | 99% |
2x Intel Xeon E5310 | 2.21s | 32MB | 433% | 2x Intel Xeon E5310 | 7.98s | 1.4MB | 100% |
Filtering reads from a region, with BAM output (empty file cache) | |||||||
sambamba view -f bam $FILENAME 20:10,000,000-20,000,000 -F "mapping_quality >= 50" -o test.bam | samtools view -b $FILENAME 20:10,000,000-20,000,000 -q50 -o test.bam | ||||||
Configuration | Time | Memory usage | CPU load | Configuration | Time | Memory usage | CPU load |
Intel Atom N450 | 22.96s | 90MB | 98% | Intel Atom N450 | 23.16s | 1.8MB | 96% |
2x Intel Xeon E5310 | 5.24s | 90MB | 250% | 2x Intel Xeon E5310 | 10.83s | 1.8MB | 98% |
Counting reads from a region (file fully cached into RAM) | |||||||
sambamba view $FILENAME -c -F "[RG] == 'ERR016156' and proper_pair and first_of_pair and not duplicate" 20:1000000-3000000 | samtools view $FILENAME -c -r 'ERR016156' -f66 -F1024 20:1000000-3000000 | ||||||
Configuration | Time | Memory usage | CPU load | Configuration | Time | Memory usage | CPU load |
Intel Atom N450 | 0.53s | 50MB | 144% | Intel Atom N450 | 0.42s | 1.3MB | 99% |
2x Intel Xeon E5310 | 0.20s | 50MB | 208% | 2x Intel Xeon E5310 | 0.27s | 1.5MB | 100% |
As you can see, sambamba
exploits parallelism where samtools
does not. The faster the storage you use, the more the speedup is (see results for indexing).
However, there're some drawbacks at the moment. Memory usage is higher due to extensive use of various buffers, and region queries are slower in some cases (though not much, about 10-20%).