Creates random reads from a genome sequence and tries to put it back together.
git clone https://github.com/molnxx/genome_assembly.git
You need numpy and a preferably circular DNA sequence.
Create random reads with a coverage n and variable length of around l:
python de_bruijn_assembly.py create -i genome.fasta -n 5 -l 200 -o outfile
outfile determines the fasta file with all the reads.
Assemble the genome sequence with de Bruijn graphs:
python de_bruijn_assembly.py assemble -i reads.fasta -k 43 -p orig_genome.fasta -o result_file
k is the length of the kmers. orig_genome.fasta is the genome sequence to proof the results (which are stored in result_file).
Included is the genome of pUC19, a well-known vector. Also the example from the paper referenced below.
python de_bruijn_assembly.py create -i pUC19.fasta -n 6 -l 150 -o pUC19_6_150.fasta
python de_bruijn_assembly.py assemble -i pUC19_6_150.fasta -k 43 -p pUC19.fasta -o results.txt
Public Domain
- Phillip E C Compeau et al. for How to apply de Bruijn graphs to genome assembly doi
- Atom Community (https://github.com/atom/atom)
- NumPy devs (https://www.numpy.org/)