-
Notifications
You must be signed in to change notification settings - Fork 0
colocalization search for genomic elements
License
yutaka-saito/cosearge
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
== Cosearge: colocalization search for genomic elements version 0.1 ==== Precompiled binaries 3DColocNormalize 3DColocSearch 3DColocTest 3DColocDiff 3DColocLook (Debian Linux 3.16.0-4-amd64 + GCC 4.9.2 + Boost 1.55.0) ==== Build yourself $ tar xzvf cosearge-xxx.tar.gz $ cd cosearge-xxx $ make You will find the executables in src directory. ==== Usage Cosearge consists of three steps. (1) normalization (2) search (3) test (1) normalize a HiC contact matrix $ 3DColocNormalize --proc n --verbose example/matrix/chr22 (2) search for the colocalization of genomic elements $ 3DColocSearch --suffix_smp n example/matrix/chr22 example/element/mRNA.chr22.bed example/element/gap.chr22.txt example/out/subset example/out/score (3) test the colocalization for detected subsets $ for i in `seq 0 9` ; do 3DColocTest --suffix n example/matrix/chr22 example/out/subset.$i example/element/gap.chr22.txt example/out/pz.$i /dev/null done For help, $ 3DColocNormalize --help $ 3DColocSearch --help $ 3DColocTest --help ==== Input files You need following input files: (1) HiC contact matrix (2) genomic elements (3) chromosome gaps (1) HiC contact matrix The HiC contact matrix is represented by several files included in one directry. See example/matrix/chr22 that contains the HiC contact matrix of the human chromosome 22. "binNumber" contains the number of bins. (1243 bins) "resolution" contains the length of each bin. (40000 bp) Note that "binNumber * resolution" is equal to the length of the human chromosome 22. (1243 * 40000) "positionIndex" contains the genomic position of each bin. (0, 40000, 80000, 120000, ...) "genomeIdxToLabel" contains the index given to each chromosome. In this example, the HiC contact matrix represents only one chromosome, the human chromosome 22, and thus the index "0" is given to the chromosome "22". "chromosomeIndex" contains the index of the chromosome to which each bin belongs. In this example, all bins belong to the chromosome "22" whose index is "0". "chromosomeStarts" contains the i-th bin from which each chromosome starts. In this example, there is only one chromosome, which starts from the 0-th bin. "heatmap" contains the contact intensities represented as the binNumber*binNumber matrix. After the normalization step by 3DColocNormalize, the two additional files are produced. "heatmap_n" contains the normalized contact intensities. "statistics_n" contains the summary statistics used in normalization. (2) genomic elements The genomic elements are represented by BED format. See example/element/mRNA.chr22.bed that contains all known mRNA transcripts in the human chromosome 22. (3) chromosome gaps The chromosome gaps are represented by gap table format used in the UCSC genome database. See example/element/gap.chr22.txt that contains gap positions in the human chromosome 22. ==== Output format After the search step by 3DColocSearch, two kinds of output files are produced. (1) colocalized subsets of genomic elements See example/out/subset.* (2) colocalization scores See example/out/score.* After the test step by 3DColocTest, additional output files are produced. (3) p-values and z-scores of colocalization scores See example/out/pz.* ==== History * version 0.1 - implemented the algorithms described in [Saito et al, submitted] ==== References Yutaka Saito, Toutai Mituyama, Cosearge: an exploratory approach to detect spatial clusters of genes beyond topologically associated domains submitted. ==== Contact Yutaka Saito yutaka.saito AT aist.go.jp
About
colocalization search for genomic elements
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published