-
Notifications
You must be signed in to change notification settings - Fork 62
/
readme.txt
135 lines (92 loc) · 8.15 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
1. Overview
The MCScanX package has two main components: 1) a modified version of MCScan algorithm allowing users to conveniently conduct synteny and collinearity detection and to clearly view multiple alignments of collinear blocks, and 2) a variety of tools to visualize and analyze the synteny and collinearity data generated by the modified MCScan algorithm.
All programs are executed using command line options on either MAC OS or Linux systems. Usage information is built into the programs. To show usage on the screen, users just need to run the program without giving any options:
"./program_name" for executable binary files;
"perl program_name.pl" for perl scripts;
"java program_name" for java classes
All code is copiable, distributable, modifiable, and usable without any restrictions.
Contact: Yupeng Wang, [email protected]; Xu Tan, [email protected]
2. Installation
On Mac OS, Xcode (http://developer.apple.com/xcode/) should be installed prior to the installation of MCScanX package. On Linux systems, the Java SE Development Kit (JDK) and "libpng" should be installed before the installation of MCScanX package.
Then simply put MCscanX.zip into a directory and run:
"
unzip MCscanX.zip
cd MCScanX
make
"
The following is the list of executable programs
Core programs (in the main folder)
MCScanX
MCScanX_h
duplicate_gene_classifier
Downstream analysis programs (in the downstream_analyses folder)
Tool 1. detect_collinear_tandem_arrays
Tool 2. dissect_multiple_alignment
Tool 3. dot_plotter.java
Tool 4. dual_synteny_plotter.java
Tool 5. circle_plotter.java
Tool 6. bar_plotter.java
Tool 7. add_ka_and_ks_to_collinearity.pl
Tool 8. group_collinear_genes.pl
Tool 9. detect_collinearity_within_gene_families.pl
Tool 10. origin_enrichment_analysis.pl
Tool 11. family_circle_plotter.java
Tool 12. family_tree_plotter.java/family_tree_plotter_show_length.java
Tool 13. family_tree_plotter_chr.java
3. Core programs
1) MCScanX
This program, implementing a modified MCScan algorithm, detects collinear blocks and progressively aligns multiple collinear blocks against reference chromosomes.
Usage:"./MCScanX dir/xyz"
MCScanX reads in two data files: xyz.blast and xyz.gff.
The xyz.blast file is simply the direct BLASTP output of m8 format;The xyz.gff file holds gene positions, following a tab-delimited format:
"sp&chr_NO gene starting_position ending_position"
MCScanX generates two plain text files "xyz.collinearity" and "xyz.tandem", which are also inputs of some downstream analyses.
2) MCScanX_h
The BLASTP input of MCScanX can be replaced by a tab-delimited file containing pair-wise homologous relationships detected by third party software. In this case, users should use MCScanX_h instead. The executation of MCScanX_h is very similar to that of MCScanX, except that the "xyz.blast" file should be replaced by "xyz.homology" file. At the bottom of screen output, statistics on numbers / percentages of collinear homolog pairs are shown.
For example, users can use the "ortholog.txt" file generated by OrthoMCL as the input ("xyz.homology") of MCScanX_h.
3) duplicate_gene_classifier
Users may use this program, which incorporate the MCScanX algorithm, to classify origins of the duplicate genes of ONE genome into whole genome /segmental (anchor/collinear genes in syntenic blocks), tandem (continuous repeat), proximal (in nearby chromosomal region but not adjacent) or dispersed (other modes than segmental, tandem and proximal) duplications.
Usage:"./duplicate_gene_classifier dir/xyz"
The input of duplicate_gene_classifier is the same with MCscanX, except an additional option for defining the maximum distance (# of genes) between 2 proximal duplicates. This program generates a ".gene_type" file.
4. Downstream analyses
1) detect_collinear_tandem_arrays
Tandem duplications often complicate collinearity detection. To enhance the power of collinearity detection, MCScan algorithms use the gene with best BLASTP hit to represent a tandem array. This program transforms match genes in collinear blocks into tandem arrays if tandem duplications exist there.
Usage:"./detect_collinear_tandem_arrays -g gff_file -b blast_file -c collinearity_file -o output_file"
2) dissect_multiple_alignment
This program dissects the number of collinear blocks at each gene locus of the reference chromosomes into the number of intra-species collinear blocks and the number of inter-species collinear blocks.
Usage:"./dissect_multiple_alignment -g gff_file -c collinearity_file -o output_file"
3) dot_plotter.java
This java script generates a dot plot for all the collinear blocks on two sets of chromosomes given by the user. Note that JDK is needed for executing Java programs.
Usage:"java dot_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file"
4) dual_synteny_plotter.java
This java script generates a dual synteny plot which links all the collinear blocks between two sets of chromosomes using straight lines.
Usage:"java dual_synteny_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file"
5) circle_plotter.java
This Java scripts generates a circular plot which links all the collinear blocks with curved lines between and within the chromosomes given by users.
Usage:"java circle_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file"
6) bar_plotter.java
This Java scripts generates a bar plot displaying chromosome rearrangement between reference and target chromosome sets given by users.
Usage:"java bar_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file"
7) add_ka_and_ks_to_collinearity.pl
This program calculates the Ka & Ks value of each collinear gene pair shown in the MCScanX output ".collinearity". BLAST and Bio-perl are needed for executing this program.
Usage:"perl add_ka_and_ks_to_collinearity.pl -i collinearity_file -d cds_file -o output_file"
8) group_collinear_genes.pl
This program groups genes through connecting collinear genes until any gene in each group has no collinear gene outside the group. This analysis can be used to construct gene families based on collinear relationships.
Usage:"perl group_collinear_genes.pl -i collinearity_file -o output_file"
9) detect_collinearity_within_gene_families.pl
This program detects collinear gene pairs within gene families.
Usage:"perl detect_collinearity_within_gene_families.pl -i gene_family_file -d collinearity_file -o output_file"
10) origin_enrichment_analysis.pl
This program identifies potential enrichment of duplicate gene origins for input gene families according to the result of duplicate_gene_classifier.
Usage:"perl origin_enrichment_analysis.pl -i gene_family_file -d gene_type_file -o output_file"
11) family_circle_plotter.java
This java script generates a circular plot which links all collinear genes within a gene family with red curved lines, and places the gene family collinearity into a genomic collinearity background.
Usage:"java family_circle_plotter -g gff_file -s collinearity_file -c control_file -f gene_family_file -o output_PNG_file"
12) family_tree_plotter.java/family_tree_plotter_show_length.java
This java script displays a gene family tree on which collinear gene pairs and tandem gene groups are connected with red and blue curves respectively. The former script does not show branch length while the latter script does.
Usage:"java family_tree_plotter -t tree_file -s collinearity_file -o output_PNG_file" (show collinear gene pairs only) or
"java family_tree_plotter -t tree_file -s collinearity_file -d tandem_pair_file -o output_PNG_file" (show both tandem and collinear gene pairs)
13) family_tree_plotter_chr.java
This java script displays a gene family tree on which collinear gene pairs and tandem gene groups are connected with red and blue curves respectively, and each gene of the tree is linked to its position on chromosomes whose synteny is shown.
Usage:"java family_tree_plotter_chr -t tree_file -g gff_file -s collinearity_file -o output_PNG_file" (show collinear gene pairs only) or
"java family_tree_plotter_chr -t tree_file -g gff_file -s collinearity_file -d tandem_pair_file -o output_PNG_file" (show both tandem and collinear gene pairs)