-
Notifications
You must be signed in to change notification settings - Fork 7
Home
It is recommended to start with the Minimal Tutorial. The minimal tutorial walks through classifying the WGS nucleotide sequences for two strains, A/swine/Iowa/A02478541/2019
(H1N1) and A/swine/Minnesota/A01785989/2019
(H3N2).
nn_classifier.R: reads in a tree, identifies clade (and other info) based on nearest neighbor for query sequences. Reference sequences are annotated with metadata information (clade, segment, state) split by the |
character. (Dependencies: ape package)
The general format of the command is
RScript nn_classifier.R <treefile.tre> [metadata column number...]
For example, the following command reads in a PB2 tree containing a single unknown named "QUERY." The script identifies the nearest neighbor, and returns the name of the QUERY with the metadata extracted from the nearest neighbors defline at positions 3 and 1:
Rscript nn_classifier.R sample_data/pb2_sample.tre 3 1
# QUERY A/swine/Minnesota/A01785592/2018 TRIG
It is assumed that the query sequence has less |
than the first metadata column specified.
This R script relies on the ape
package.
pipeline.sh: creates the blast database from reference.fasta, identifies the segment (HA to NS) of the query fasta file and splits them out by segment, builds the segment trees, and runs nn_classifier.R
to pull out classification of each segment. (Dependencies: NCBI blast, smof, R with ape, mafft, FastTree)
bash pipeline.sh <query.fasta>
For example
bash pipeline.sh sample_data/query.fasta
# ==== Final results in query.fasta_Final_Output.txt
# alignment and tree files in the 'query.fasta_output' folder
# Tree files are listed below:
# -rw-r--r-- 1 macusers 15856 Jun 6 10:36 query.fasta_output/H1.tre
# -rw-r--r-- 1 macusers 18508 Jun 6 10:36 query.fasta_output/H3.tre
# -rw-r--r-- 1 macusers 10042 Jun 6 10:36 query.fasta_output/N1.tre
# -rw-r--r-- 1 macusers 17472 Jun 6 10:36 query.fasta_output/N2.tre
# -rw-r--r-- 1 macusers 14575 Jun 6 10:36 query.fasta_output/PB2.tre
# -rw-r--r-- 1 macusers 14014 Jun 6 10:37 query.fasta_output/PB1.tre
# -rw-r--r-- 1 macusers 15881 Jun 6 10:37 query.fasta_output/PA.tre
# -rw-r--r-- 1 macusers 16242 Jun 6 10:37 query.fasta_output/NP.tre
# -rw-r--r-- 1 macusers 17712 Jun 6 10:37 query.fasta_output/M.tre
# -rw-r--r-- 1 macusers 13355 Jun 6 10:38 query.fasta_output/NS.tre
head query.fasta_Final_Output.txt
# QUERY_MH540411_A/swine/Iowa/A02169143/2018 H1 pdm-vaccine 1A.3.3.2-vaccine
# QUERY_MH608109_A/swine/Minnesota/A01785575/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MH936448_A/swine/Minnesota/A01785608/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MK024151_A/swine/Minnesota/A01785613/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MH802648_A/swine/Iowa/A02254795/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MH608115_A/swine/Oklahoma/A01785571/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MH561753_A/swine/Oklahoma/A01785573/2018 H1 gamma2-beta-like 1A.2-3-like
# QUERY_MH551254_A/swine/South_Dakota/A02016893/2018 H1 beta 1A.2
# QUERY_MH595472_A/swine/Illinois/A02170163/2018 H1 alpha 1A.1.1
# QUERY_MH922878_A/swine/Ohio/18TOSU4522/2018 H1 alpha 1A.1.1
mergeGC.R: will attempt to merge segments into gene constellations. (Dependencies: reshape2 package)
General format of the command is:
Rscript merge.R <query_Final_Output.txt>
The output will be in Merge_Attempt.txt
. For example:
bash pipeline.sh sample_data/query.fasta
# ==== Final results in query.fasta_Final_Output.txt
# alignment and tree files in the 'query.fasta_output' folder
# Tree files are listed below:
# -rw-r--r-- 1 user macusers 15648 Mar 15 09:45 query.fasta_output/H1.tre
# -rw-r--r-- 1 user macusers 16997 Mar 15 09:45 query.fasta_output/H3.tre
# -rw-r--r-- 1 user macusers 10040 Mar 15 09:46 query.fasta_output/N1.tre
# -rw-r--r-- 1 user macusers 17585 Mar 15 09:46 query.fasta_output/N2.tre
# -rw-r--r-- 1 user macusers 13674 Mar 15 09:46 query.fasta_output/PB2.tre
# -rw-r--r-- 1 user macusers 11184 Mar 15 09:46 query.fasta_output/PB1.tre
# -rw-r--r-- 1 user macusers 15076 Mar 15 09:46 query.fasta_output/PA.tre
# -rw-r--r-- 1 user macusers 16300 Mar 15 09:46 query.fasta_output/NP.tre
# -rw-r--r-- 1 user macusers 18773 Mar 15 09:46 query.fasta_output/M.tre
# -rw-r--r-- 1 user macusers 13251 Mar 15 09:46 query.fasta_output/NS.tre
cat query.fasta_Final_Output.txt |awk -F'/' 'OFS="\t"{print $4,$0}' | awk -F'\t' 'OFS="\t" {print $1,$3,$4,$5}' > query_output.txt
Rscript mergeGC.R query_output.txt
head -5 Merge_Attempt.txt | column -t
# V1 H1 H3 N1 N2 PB2 PB1 PA NP M NS
# 18TOSU3605 - 2010-human_like|3.2010.1 - 2002_NA_N2 TRIG_avOrigin - TRIG_avOrigin TRIG_classicalOrigin pdm_EurasianSwOrigin TRIG_classicalOrigin
# 18TOSU3611 - 2010-human_like|3.2010.1 - 2002_NA_N2 TRIG_avOrigin - TRIG_avOrigin TRIG_classicalOrigin pdm_EurasianSwOrigin TRIG_classicalOrigin
# 18TOSU3614 - 2010-human_like|3.2010.1 - 2002_NA_N2 TRIG_avOrigin - TRIG_avOrigin TRIG_classicalOrigin pdm_EurasianSwOrigin TRIG_classicalOrigin
# 18TOSU4522 alpha|1A.1.1 - - 2002_NA_N2 TRIG_avOrigin TRIG_huOrigin pdm_TRIGavOrigin pdm_TRIGclassicalOrigin pdm_EurasianSwOrigin TRIG_classicalOrigin