-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
122 lines (85 loc) · 5.78 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
##########################
Extracting the Ramp Sequence from genes
ExtRamp.py
Created By: Logan Brase and Justin Miller
Email: [email protected] or [email protected]
##########################
ExtRamp is a tool to extract the Ramp sequence from the beginning of genes.
It uses the tAI values (user provided) or codon proportions to determine the speed of translation
and the appropriate cut off point for the Ramp sequence.
##########################
ARGUMENT OPTIONS:
ExtRamp requires 1 user input at the runtime:
-i input INPUT fasta file containing the cds gene sequences of interest (.gz file extension required for gzipped files)
Optional arguments:
-a tAI INPUT file in csv format, contains the tAI values. Two formats are accepted as shown in the two included examples (S.cerevisiae_tAI_Values and Ecoli_tAI_Values)
-u rscu INPUT fasta file used to compute relative synonymous codon usage and relative adaptiveness of codons
-o ramp OUTPUT fasta file to write the ramp sequences to
-v verbose Flag to print progress to standard error
-l vals OUTPUT file in csv format, the sequence speed values are written here if provided
-p speeds OUTPUT speeds file to write tAI/relative adaptiveness values for each position in the sequence from the codon after the start codon to the codon before the stop codon. Format: Header newline list of values
-n noRamp OUTPUT Text file to write the gene names that contained no ramp sequence.
-z removedSequences OUTPUT Write the header lines that are removed (e.g., sequence not long enough or not divisible by 3) to output file
-x afterRamp OUTPUT Fasta file containing gene sequences after the identified ramp sequence
-t threads The number of threads used to run the program, default is 9
-w window The number of codons in the ribosome window, default is 9 codons
-s stdev The number of standard deviations below the mean the cutoff value will be. Default is not used.
-d stdevRampLength The number of standard deviations in the lengths of the ramp sequences. Default is not used.
-m middle The type of statistic used to measure the middle (consensus) efficiency. Options are 'hmean','mean', 'gmean', and 'median'. Default is 'hmean'
-r rna Flag for RNA sequences. Default is DNA.
-f determine_cutoff Flag to determine outlier percentages for mean cutoff based on species FASTA file. Default: local minimum in first 8 percent of gene')
-c cutoff Cutoff for where the local minimum must occur in the gene for a ramp to be calculated. If --determine_cutoff (-f) is used, then this value may change. Is not used if standard deviations are set. Default:8
-e determine_cutoff_percent Cutoff for determining percent of gene that is in an outlier region. Used in conjunction with -f. Default is true outliers. Other options include numbers from 0-99, which indicate the region of a box plot. For instance, 75 means the 75th quartile or above. Default: True Outliers
-q seqLength Minimum nucleotide sequence length. Default is 100 amino acids * 3 = 300 nucleotides
NOTE: Only the standard codon table is supported because many codon tables have ambigous codons that encode for more than one amino acid. Since we use the relative codon adaptiveness for each amino acid, we cannot account for ambiguous codons.
##########################
REQUIREMENTS:
ExtRamp.py uses Python version 3.5 in a Linux environment
Python Libraries:
1. statistics
2. gzip
3. csv
4. argparse
5. sys
6. tqdm (optional)
7. multiprocessing
8. numpy
9. scipy
10. math
11. re
If any of those libraries is not currently in your Python Path, use the following command:
pip3 install --user [library_name]
to install the library to your path.
##########################
USAGE
With tAI file (RECOMMENDED):
If you do not have the tAI values, check the stAicalc at:
http://tau-tai.azurewebsites.net/
A large list of the species with known tAI values is present. You must select a valid fasta
file and then click submit for the tAI values to be printed at the bottom. The fasta file
does not need to be from the correct species, but it won't print the values until a file
is selected. The values can then be exported into a CSV File.
python ExtRamp.py -i path/to/SEQUENCES.fasta.gz -a path/to/tAI.csv -o path/to/OUTFILE.fasta
Without tAI file:
python ExtRamp.py -i path/to/SEQUENCES.fasta.gz -o path/to/OUTFILE.fasta -v
After running the top command, these updates will be printed to standard error:
Reading Sequences...
Calculating Codon Speeds...
Calculating Sequence Speeds...
Consensus Codon Efficiency using hmean: [NUMBER]
Standard Deviation: [NUMBER]
Maximum Efficiency in Ramp Sequence [NUMBER]
Isolating Ramp Sequences...
[NUMBER] Ramp Sequences found out of [NUMBER] total sequences
##########################
EXAMPLE USAGE WITH TAI VALUES
Try running ExtRamp.py on the provided S_cerevisiae example files in the example_files folder.
python ExtRamp.py -i example_files/Saccharomyces_cerevisiae.gz -a example_files/S_cerevisiae_tAI_Values.csv -o outTest.fasta
The output should match the S_cerevisiae_output.fasta file in the example_files folder (note: due to multithreading, the order of the sequences might vary)
##########################
EXAMPLE USAGE WITHOUT TAI VALUES
python ExtRamp.py -i example_files/Homo_sapiens.gz -o output.fasta
For us, that command takes approximately six minutes of user time. Using 16 cores, it took approximately 30 seconds.
output.fasta should match example_files/Homo_sapiens_output.fasta (note: the order of the sequences might vary)
##########################
Thank you, and happy researching!