forked from tolkit/fasta_windows
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes tolkit#6 - optional description for output. Some code re-factor…
…ing.
- Loading branch information
1 parent
7df586e
commit b115bd8
Showing
8 changed files
with
391 additions
and
288 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[package] | ||
name = "fasta_windows" | ||
version = "0.2.1" | ||
version = "0.2.2" | ||
authors = ["Max Brown <[email protected]>"] | ||
edition = "2018" | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# fasta_windows | ||
|
||
Fast statistics in windows over a genome in fasta format. | ||
Written for Darwin Tree of Life chromosomal level genome assemblies. The executable takes a fasta formatted file and calculates some statistics of interest: | ||
|
||
- GC content | ||
- GC proportion | ||
- GC skew | ||
|
@@ -11,25 +12,23 @@ Fast statistics in windows over a genome in fasta format. | |
|
||
## Usage | ||
|
||
The masked (-m) flag only affects the first four output options above - kmers are coerced to uppercase, and shannon entropy probably needs some attention on that. | ||
|
||
``` | ||
Fasta windows 0.2.1 | ||
Fasta windows 0.2.2 | ||
Max Brown <[email protected]> | ||
Quickly compute statistics over a fasta file in windows. | ||
USAGE: | ||
fasta_windows [FLAGS] [OPTIONS] --fasta <fasta> --output <output> | ||
FLAGS: | ||
-c, --canonical_kmers Should the canonical kmers be calculated? | ||
-h, --help Prints help information | ||
-m, --masked Consider only uppercase nucleotides in the calculations. | ||
-V, --version Prints version information | ||
-d, --description Add an extra column to _windows.tsv output with fasta header descriptions. | ||
-h, --help Prints help information | ||
-m, --masked Consider only uppercase nucleotides in the calculations. | ||
-V, --version Prints version information | ||
OPTIONS: | ||
-f, --fasta <fasta> The input fasta file. | ||
-o, --output <output> Output filename for the CSV (without extension). | ||
-o, --output <output> Output filename for the TSV's (without extension). | ||
-w, --window_size <window_size> Integer size of window for statistics to be computed over. [default: 1000] | ||
``` | ||
|
||
|
@@ -48,11 +47,7 @@ cargo build --release | |
|
||
The default window size is 1kb. | ||
|
||
## Output & benchmarks | ||
|
||
The only annoying overhead at the moment is sequence counting for the progress bar, which must be computed before the progress bar is initiated. | ||
|
||
### Output | ||
## Output | ||
|
||
Output is now a tsv with bed-like format in the first three columns: | ||
|
||
|
@@ -87,6 +82,10 @@ SUPER_1 8000 9000 114 67 43 68 63 86 52 | |
SUPER_1 9000 10000 97 97 44 63 72 95 50 67 46 44 33 46 85 49 42 69 | ||
``` | ||
|
||
### Updates & bugs | ||
### Comments, updates & bugs | ||
|
||
As of version 0.2.2, I've removed canonical kmers as an option; it was really computationally expensive and I couldn't think of a way to efficienty add it in. End users that wish this are pointed in the direction of <a href="https://github.com/tolkit/fw_group">fw_group</a>, which will at some point soon provide this functionality. | ||
|
||
The masked (-m) flag only affects GC content, GC proportion, GC skew, proportion of G's, C's, A's, T's, N's. Kmers are coerced to uppercase automatically. Shannon index counts only uppercase nucleotides. | ||
|
||
Please use, test, and let me know if there are any bugs or features you want implemented. Either raise an issue, or email me (see email in usage). |
Oops, something went wrong.