GitHub - alcamerone/HiCCollectorsCurve: Takes the list of interactions deemed significant by SeqMonk and the raw SAM file generated from the HiC experiment, and generates a collector's curve to investigate sampling depth. NOTE: HIGHLY MEMORY INTENSIVE.

Works with Seqmonk output to produce a collectors curve of significant interactions hit vs. number of reads sampled

Generates a collectors curve to determine completeness of significant interactions sampled.

Requires:

-A sorted probe list generated by SeqMonk, in the format "Probe(Chr:Start-End(length in kbp)) ChrNumber StartPos EndPos" (tab-separated)

-An interaction list generated by SeqMonk, in the format "Probe1 Chromosome1 Start End Probe2 Chromosome2 Start End" (tab-separated)

-A SAM-formatted file (can be generated from BAM-format using samtools), e.g. the one used by SeqMonk to generate the above files

Usage: python read_vs_significant_interaction_collectors_curve.py [OPTIONS]

Options:

"-p", "--probe_list_fp" The path to the probe list generated by SeqMonk

"-i", "--sig_ints_fp" The path to the interaction list generated by SeqMonk

"-s", "--sam_file_fp" The path to the SAM file used to generate SeqMonk results

"-z", "--step_size" The step-size to increase the proportion of reads sub-sampled by each iteration (default: 0.1)

"-n", "--num_iter" The number of subsamples to generate for each step (default: 100)

"-t", "--num_threads" The number of concurrent processes to start (default: 2)

"-f", "--save_sig_ints_hit_fp" Optional: save the proportions of significant interactions hit at each step to this file for plotting later

Example:

To find the number of significant interactions hit at 10%, 20%, 30% ... 90% of reads, at 100 samples per read:

./read_vs_significant_interaction_collectors_curve.py --probe_list_fp seqmonk_probe_list.txt --sig_ints_fp seqmonk_output.txt --sam_file_fp reads.sam -z 0.1 -n 100

NB: Very memory intensive, depending on the size of the input files. Intended to be run on a machine with a lot of RAM when processing large datasets.

Running the small example:

In the "test" directory are some small example files which can be used to run the code and generate output. To do this, run:

./read_vs_significant_interaction_collectors_curve.py -p test/Probe_List.small.test.txt -i test/Sig_Intrxns.small.test.txt -s SAM_File.small.test.sam -z 0.1 -n 100

The file "Sig_Interxns.small.test_collectors_curve.png" is an example of the output produced by the program. The large error in this case is due to the very small sample size. Not every run will be the same, as the sampling is done randomly at each step.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
test		test
README.md		README.md
read_vs_sig_interaction_collectors_curve.py		read_vs_sig_interaction_collectors_curve.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

alcamerone/HiCCollectorsCurve

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages