-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic effective for #87
Comments
I would rather recommend that you do not compute epic-effective for hg38, it is included in epic. While the version included is calculated for UCSC, it is not going to make a difference :) Thanks for reporting :) Why do you want to calculate the value? |
I want to calculate the EFFECTIVE_GENOME_FRACTION for hg38 genome. As it is mentioned in the command -"Use a different effective genome fraction than the one included in epic. The default value depends on the genome and readlength, but is a number between 0 and 1." The value will change based on the read length. So for a particular read length I want to compute the effective genome fraction. |
I can guarantee you that it will not matter much for epic at least. The
difference between 36 and 100 bp for hg38 is < 10% which does not make a
difference to the statistics. And I cannot help you run it unfortunately.
Do you have a cluster/larger machine to run this on? The problem is that
the another piece of software (jellyfish2) uses too much memory.
…On Fri, Sep 7, 2018 at 2:39 PM parida007 ***@***.***> wrote:
I want to calculate the EFFECTIVE_GENOME_FRACTION for hg38 genome. As it
is mentioned in the command -"Use a different effective genome fraction
than the one included in epic. The default value depends on the genome and
readlength, but is a number between 0 and 1." The value will change based
on the read length. So for a particular read length I want to compute the
effective genome fraction.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ9I0r_wb62nHVtxJliQY72YiRMQUOk3ks5uYmj6gaJpZM4WeM-J>
.
|
Thanks for your information. No I don't have a large cluster to run it. I have run the command with default parameter setting and getting the enrichment result. Can you please help me to understand what does the following means: |
It is just output of the software. It says your readlength is 36 and
therefore it chooses a certain egf.
:)
…On Tue, Sep 11, 2018 at 7:39 AM parida007 ***@***.***> wrote:
Thanks for your information. No I don't have a large cluster to run it. I
have run the command with default parameter setting and getting the
enrichment result. Can you please help me to understand what does the
following means:
2553959578.0 effective_genome_fraction (File:
compute_background_probabilites, Log level: DEBUG, Time: Mon, 10 Sep 2018
17:12:44 )
What I have understood is effective genome fraction must be in between 0
and 1. Though initially it shows something like:
Used first 10000 reads of SRR524941.bam.bed to estimate a median read
length of 36.0
Mean readlength: 36.0004, max readlength: 40, min readlength: 32. (File:
find_readlength, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )
Using an effective genome fraction of 0.8269827491300733. (File: genomes,
Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ9I0uAk7dZKU3N0hH-7Ij526SS7-GVSks5uZ0yQgaJpZM4WeM-J>
.
|
That I understood.. but the thing is that can the egf be 2553959578.0 ?? |
No, that is the effective genome size. Good catch. Should update that :)
…On Tue, Sep 11, 2018 at 10:11 AM parida007 ***@***.***> wrote:
That I understood.. but the thing is that can the egf be 2553959578.0 ??
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ9I0on3UNYX6iZ0sGNzgh7xNrftWcwEks5uZ3BJgaJpZM4WeM-J>
.
|
So it should be 0.8269827491300733... If I am not wrong.. |
Yes, but the egs is egf * genome size. The code is correct, the logging
info not.
…On Tue, Sep 11, 2018 at 10:53 AM parida007 ***@***.***> wrote:
So it should be 0.8269827491300733... If I am not wrong..
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ9I0lNvY684ig89TcR8CqWzk94uLHVcks5uZ3oXgaJpZM4WeM-J>
.
|
epic -t SRR524944.bed SRR524945.bed SRR524946.bed -c ../GSM945859-input/SRR504936.bed ../GSM945859-input/SRR504937.bed --genome hg38 --false-discovery-rate-cutoff 0.01 -o enriched_regions_GSM970218.csv -bw bigwigs -sm matrix.gz # epic_version: 0.2.12, pandas_version: 0.23.4 (File: epic, Log level: INFO, Time: Tue, 11 Sep 2018 15:01:20 )cat: write error: Broken pipe Can you tell me whether this error is having an impact with the final output matrix |
No, it is not :)
…On Tue, Sep 11, 2018 at 11:29 AM parida007 ***@***.***> wrote:
epic -t SRR524944.bed SRR524945.bed SRR524946.bed -c
../GSM945859-input/SRR504936.bed ../GSM945859-input/SRR504937.bed --genome
hg38 --false-discovery-rate-cutoff 0.01 -o enriched_regions_GSM970218.csv
-bw bigwigs -sm matrix.gz # epic_version: 0.2.12, pandas_version: 0.23.4
(File: epic, Log level: INFO, Time: Tue, 11 Sep 2018 15:01:20 )
cat: write error: Broken pipe
Can you tell me whether this error is having an impact with the final
output matrix
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ9I0ssoBafu4G872Zd1VxHyNTeWDIRYks5uZ4KMgaJpZM4WeM-J>
.
|
I want to run the epic-effective for human genome hg38. I am using a 16 gb ram for it but after using the command it is showing memory error.
using the following command:
epic-effective -r 36 GRCh38.p10.genome.fa
getting the following error:
File analyzed: GRCh38.p10.genome.fa (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
Genome length: 3236815040 (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
File analyzed: GRCh38.p10.genome.fa
Genome length: 3236815040
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 20443042440 bytes of memory
Aborted (core dumped)
Failed to open input file '/tmp/GRCh38.p10.genome.fa.jf'
Traceback (most recent call last):
File "/usr/local/bin/epic-effective", line 38, in
effective_genome_size(fasta, read_length, nb_cpu, tmpdir)
File "/usr/local/lib/python2.7/dist-packages/epic/scripts/effective_genome_size.py", line 56, in effective_genome_size
shell=True)
File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'jellyfish stats /tmp/GRCh38.p10.genome.fa.jf' returned non-zero exit status 1
rm: cannot remove '/tmp/GRCh38.p10.genome.fa.jf': No such file or directory
But when I use a single chromosome it works fine.
epic-effective -r 36 gencode_chrY.fa
Temporary directory: /tmp/ (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
Genome length: 57227415 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa
Genome length: 57227415
Number unique 36-mers: 21608143 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Effective genome size: 0.377583768199 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Number unique 36-mers: 21608143
Effective genome size: 0.377583768199
So how can overcome this issue for effective genome size calculation for entire genome.
The text was updated successfully, but these errors were encountered: