usage: run_tipp.py [-h] [-v] [-A N] [-P N] [-F N] [--distance DISTANCE]
[-M DIAMETER] [-S DECOMP] [-p DIR] [-o OUTPUT]
[-d OUTPUT_DIR] [-c CONFIG] [-t TREE] [-r RAXML] [-a ALIGN]
[-f FRAG] [-m MOLECULE] [-x N] [-cp CHCK_FILE] [-cpi N]
[-seed N] [-R N] [-at N] [-D] [-pt N] [-PD N]
[-tx TAXONOMY] [-txm MAPPING] [-adt TREE] [-C N]
This script runs the SEPP algorithm on an input tree, alignment, fragment
file, and RAxML info file.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
DECOMPOSITION OPTIONS:
These options determine the alignment decomposition size and taxon
insertion size. If None is given, then the default is to align/place at
10% of total taxa. The alignment decomosition size must be less than the
taxon insertion size.
-A N, --alignmentSize N
max alignment subset size of N [default: 10% of the
total number of taxa or the placement subset size if
given]
-P N, --placementSize N
max placement subset size of N [default: 10% of the
total number of taxa or the alignment length
(whichever bigger)]
-F N, --fragmentChunkSize N
maximum fragment chunk size of N. Helps controlling
memory. [default: 5000]
--distance DISTANCE minimum p-distance before stopping the
decomposition[default: 1]
-M DIAMETER, --diameter DIAMETER
maximum tree diameter before stopping the
decomposition[default: None]
-S DECOMP, --decomp_strategy DECOMP
decomposition strategy [default: using tree branch
length]
OUTPUT OPTIONS:
These options control output.
-p DIR, --tempdir DIR
Tempfile files will be written to DIR. Full-path
required. [default: /tmp/sepp]
-o OUTPUT, --output OUTPUT
output files with prefix OUTPUT. [default: output]
-d OUTPUT_DIR, --outdir OUTPUT_DIR
output to OUTPUT_DIR directory. full-path required.
[default: .]
INPUT OPTIONS:
These options control input. To run SEPP the following is required.A
backbone tree (in newick format), a RAxML_info file (this is the file
generated by RAxML during estimation of the backbone tree. Pplacer uses
this info file to set model parameters),a backbone alignment file (in
fasta format), and a fasta file including fragments. The input sequences
are assumed to be DNA unless specified otherwise.
-c CONFIG, --config CONFIG
A config file, including options used to run SEPP.
Options provided as command line arguments overwrite
config file values for those options. [default: None]
-t TREE, --tree TREE Input tree file (newick format) [default: None]
-r RAXML, --raxml RAXML
RAxML_info file including model parameters, generated
by RAxML.[default: None]
-a ALIGN, --alignment ALIGN
Aligned fasta file [default: None]
-f FRAG, --fragment FRAG
fragment file [default: None]
-m MOLECULE, --molecule MOLECULE
Molecule type of sequences. Can be amino, dna, or rna
[default: dna]
OTHER OPTIONS:
These options control how SEPP is run
-x N, --cpu N Use N cpus [default: number of cpus available on the
machine]
-cp CHCK_FILE, --checkpoint CHCK_FILE
checkpoint file [default: no checkpointing]
-cpi N, --interval N Interval (in seconds) between checkpoint writes. Has
effect only with -cp provided.[default: 3600]
-seed N, --randomseed N
random seed number.[default: 297834]
TIPP OPTIONS:
These arguments set settings specific to TIPP
-R N, --reference_pkg N
Use a pre-computed reference package [default: None]
-at N, --alignmentThreshold N
Enough alignment subsets are selected to reach a
commulative probability of N. This should be a number
between 0 and 1 [default: 0.95]
-D, --dist Treat fragments as distribution
-pt N, --placementThreshold N
Enough placements are selected to reach a commulative
probability of N. This should be a number between 0
and 1 [default: 0.95]
-PD N, --push_down N Whether to classify based on children below or above
insertion point. [default: True]
-tx TAXONOMY, --taxonomy TAXONOMY
A file describing the taxonomy. This is a comma-
separated text file that has the following fields:
taxon_id,parent_id,taxon_name,rank. If there are other
columns, they are ignored. The first line is also
ignored.
-txm MAPPING, --taxonomyNameMapping MAPPING
A comma-separated text file mapping alignment sequence
names to taxonomic ids. Formats (each line):
sequence_name,taxon_id. If there are other columns,
they are ignored. The first line is also ignored.
-adt TREE, --alignmentDecompositionTree TREE
A newick tree file used for decomposing taxa into
alignment subsets. [default: the backbone tree]
-C N, --cutoff N Placement probability requirement to count toward the
distribution. This should be a number between 0 and 1
[default: 0.0]