KMC3, stxxl, sdsl-lite.
# Download dependencies
git submodule init
git submodule update
# Build KMC
cd KMC
make
cd ..
# Build sdsl-lite
cd sdsl-lite
sh install.sh
cd ..
# Build stxxl
cd stxxl
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_GNU_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=./install
make
make install
cd ../..
# Build bufBOSS
make all
There are three programs: bufboss_build
, bufboss_update
and bufboss_query
. They will be compiled to the directory ./bin
.
We recommend building the index out of a KMC database. For example:
MY_INPUT=data/reads.fna
K=31
./KMC/bin/kmc -v -k$K -m1 -ci1 -cs1 -fm $MY_INPUT temp/kmc_db temp
./bin/bufboss_build -o my_index -t temp --KMC temp/kmc_db
To update the index by adding the k-mers in a file and deleting the k-mers in another file, run the following:
MY_ADDITIONS=data/additions.fna
MY_DELETIONS=data/deletions.fna
./bin/bufboss_update -i my_index/ -o my_index --add $MY_ADDITIONS --del $MY_DELETIONS --buffer-fraction 0.1
This overwrites the previous index. The value of --buffer-fraction determines how often the buffer is flushed. If the buffer fraction is t, then the buffer is flushed when it has 10% of the edges compared to the static part of the data structure. To query sequences agaisnt the index, run the following:
MY_QUERIES=data/queries.fna
./bin/bufboss_query -i my_index -q $MY_QUERIES
This prints bits 1 and 0 in ASCII such that for every input sequence R, we print one line L consisting of characters '0' and '1' such L[i] == '1' iff (k+1)-mer R[i..i+k] is found in the index.
Usage:
bufboss_build [OPTION...]
-k arg Order of the de Bruijn graph. Node are k-mers, edges
(k+1)-mers. If building from KMC, don't give this.
(default: 0)
-o, --out arg Output directory. (default: "")
-a, --add arg Path to a fasta-file. Adds all (k+1)-mers of the
fasta-file to the index. If building from KMC, don't give
this. (default: "")
--add-files arg Path to a list of fasta-files, one per line. Adds all
(k+1)-mers in all the files to the index. If building
from KMC, don't give this. (default: "")
-d, --KMC arg Build from KMC database (path to a KMC database). The
KMC database consists of two files: xxx.kmc_pre and
xxx.kmc_suf. You should give only the xxx part here. The
database should be built from canonical k-mers (the
default behaviour of KMC) (default: "")
-r, --revcomp Include reverse complemented k-mers. If building from
KMC, don't give this.
-c, --rrr Use rrr compression on bit vectors.
-h, --help Print instructions.
-t, --tempdir arg Directory for temporary working space. (default: "")
Usage:
bufboss_update [OPTION...]
-i, --index arg The directory of the BOSS index. If not given, a
new BOSS is built. (default: "")
-k arg If an input index is not given, a new BOSS is
built with this k. Otherwise, this k is ignored.
(default: 0)
-o, --out arg Output directory. (default: "")
-a, --add arg Path to a fasta-file. Adds all (k+1)-mers of the
fasta-file to the index. (default: "")
--add-files arg Path to a list of fasta-files, one per line.
Adds all (k+1)-mers in all the files to the index
(default: "")
--add-before-del If both additions and deletions are given, the
deletions are executed first by default. If you
want to execute additions first, give this flag.
-d, --del arg Path to a fasta-file. Deletes all (k+1)-mers of
the fasta-file from the index. (default: "")
--del-files arg Path to a list of fasta-files, one per line.
Deletes all (k+1)-mers in all the files from the
index (default: "")
-r, --revcomp Include reverse complemented k-mers.
-c, --rrr Use rrr compression on bit vectors.
--end-flush Flush the buffer at the end before writing to
disk.
--count-dummies Count the number of dummy nodes after the update
-b, --buffer-fraction arg If this fraction is x and boss has n nodes, then
the buffer is flushed when it has max(n*x,10000)
k-mers. (default: 0.01)
-h, --help Print instructions.
For every input read R, prints to stdout one line L consisting of characters '0' and '1' such L[i] == '1' iff (k+1)-mer R[i..i+k] is found in the index.
Usage:
bufboss_query [OPTION...]
-i, --index arg Path to the directory of the index. (default: "")
-o, --out arg Output file. If not given, prints to stdout. (default: "")
-r, --revcomp Search reverse-complemented k-mers also.
-c, --rrr This option *must* be given if the index was built with
rrr compression.
-h, --help Print instructions.
-q, --query arg Query FASTA-file (default: "")
Currently we support only k less or equal to 31. The input files must be in (multi)fasta format.