-
Notifications
You must be signed in to change notification settings - Fork 2
Genome Structure Correction
ParkerLab/encodegsc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DEPENDENCIES: The segmentation script includes plotting support if matplotlib is installed. The block bootstrap code can plot the test distribution histogram and qqplots if rpy is installed. INSTALL: Jsut enter the directory and run the script, or move the directory to site-packages to use as a module. TEST BLOCK BOOTSTRAP: To test the block bootstrap code, run ./block_bootstrap.py \ -1 test_data/conserved_sequences.bed -2 test_data/ENCODE_annotations.bed \ -d test_data/ENCODE_region_lengths.txt -r 0.041 -n 20 -v For a detailed explanation of the command line options, run: ./block_bootstrap.py --help TEST SEGMENTATION: NOTE: Segmentation can only decrease p-values. If you run the block bootstrap, are interested only in testing, and find a very small p-value, there is probably not any reason to go back and segment the data. That p-value will only get smaller after segmentation. However, if you are building confidence intervals, it may still be useful. To test the segmentation code, first: -generate test data ./segmentation.py -g -test the segmentation on the generated data ./segmentation.py -1 sim_output_1.bed -2 sim_output_2.bed \ -d sim_lengths.txt -m 5000 -s 4 -v -p For a detailed explanation of the command line options, run: ./segmentation.py --help INPUT FILE FORMATS: Feature Input Files: The feature input files must contain new line delimited regions of the form: chrom ( whitespace ) chromStart ( whitespace ) chromEnd ( whitespace ) ( any other fields )\n where the 3 necessary fields are: field type description chrom string Name of the chromosome chromStart int Starting position on the chromosome chromEnd int Ending position on the chromosome This is compatible with all BED(>=3)+X formats. More specifically, this is compatible with the following ENCODE standard file formats, as described at http://genomewiki.cse.ucsc.edu/EncodeDCC/index.php/File_Formats narrowPeak: Narrow (or Point-Source) Peaks Format broadPeak: Broad Peaks (or Regions) Format gappedPeak: Gapped Peaks (or Regions) Format tagAlign: Tag Alignment Format pairedTagAlign: Tag Alignment Format for Paired Reads NRE Bed6 Format BiP Bed8 Format Region Length Input Files: The region lengths input file should consist solely of lines of the form chrom ( whitespace ) chromStart ( whitespace ) chromEnd \n where the 3 necessary fields are: field type description chrom string Name of the chromosome chromStart int Starting position on the chromosome chromEnd int Ending position on the chromosome The file format is described at http://genomewiki.cse.ucsc.edu/EncodeDCC/index.php/File_Formats under the heading genomicCoverageFile. CHANGELOG: 0.5.1 - initial release marginal and conditional basepair overlap single file segmentation 0.6.1 added multiple regions segmentation 0.7.1 changed the lengths file format optimized segmentation for large, binary feature regions several bug fixes ( see svn logs for more details ) 0.7.2 fixed a bug related to 64-bit platform compilation ( Thanks to Michiel de Hoon for the fix ) fixed a bug with specific types of lengths files ( Thanks to Ian Durham for the report ) sometimes manifest itself as a ZeroDivision Exception, sometimes as a negative region length error. fixed a segmentation file write output bug ( thanks to Mikhail Spivakov for the report ) 0.7.3 fixed a mac specific compile bug ( Thanks to Ian Durham for the report ) 0.7.4 fixed a python 2.3 64-bit compile bug ( Thanks to Ian Durham for the report ) fixed a python 2.3 compatability bug ( Thanks to Ian Durham for the report ) 0.8.0 Rewrite. Added new tests for continuous data, made it easier to add new tests, and removed some superfluous code.
About
Genome Structure Correction
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published