Errors when running PAQR step2 #19

Sylarair · 2020-09-05T11:24:29Z

Hi,

Thanks for developing PAQR_KAPAC. I am facing a problem when running step2 of PAQR with the source code (error shown in the attached figure).

Does anyone have idea from which the error comes? Many Thanks!

Sylarair · 2020-09-05T11:40:24Z

By the way, The data I used in the code is from https://www.encodeproject.org/experiments/ENCSR000CWC/ (ENCFF391BFB.bam, ENCFF983OXY.bam). Also, the config.yaml setting is shown in the attached figure. In addition, I have put the problem into the github issue. Finally, the attached text file is the reference used in my config.yaml.

clusters.mm10.vM22.no_tsl.atlas2.canonical_chr.tandem.noOverlap_strand_specific.txt

full_transcripts.mm10.vM22.no_tsl.atlas2.canonical_chr.tandem.noOverlap_strand_specific.txt

koljaLanger · 2020-09-05T12:12:13Z

Hi

can you please post the output of cat HNRPC_KD/no_bias_samples.out. One possible reason would be that the samples have biased coverages and are excluded from processing. In this case, we highly recommend to do some sanity checks, e.g. visually inspect the coverage in a genome browser to ensure that the sequencing data has trustworthy quality.

If this is really the reason for the error, you could mitigate it by lowering the "bias.median.cutoff" parameter in the config file.

Rgds
Ralf

Sylarair · 2020-09-05T12:26:18Z

Thanks for your reply. The file is empty.

Sylarair · 2020-09-05T12:29:43Z

Also, when I tried to rerun, it shows such logs. What should I do with this? Thanks!

Sylarair · 2020-09-05T12:37:29Z

Does this mean there is no valid sample? Thanks.

Sylarair · 2020-09-05T12:39:19Z

Meanwhile, the two samples median TIN is 76 and 79. It seems ok (default bias.median.cutoff = 70).

koljaLanger · 2020-09-05T13:19:35Z

Ok,
I think a know the root cause:
We designed PAQR in a way that it provides the input for KAPAC which is why PAQR always works with control-treatment sample pairs.
In your case, both samples are labeled as "HNRNPC" in the config and so no valid pair of samples is inferred in the first part.

An easy workaround would be to adjust your config file as follows:

ctl_rep1: {bam: ENCFF391BFB, type: CNTRL}
HNRNPC_rep1: {bam: ENCFF983OXY, type: KD, control: ctl_rep1}

In this setting, running PAQR would result in the estimation of poly(A) site usage for both samples independently.

Hope this helps!
Ralf

Sylarair · 2020-09-05T13:46:07Z

Oh, thanks so much! Actually, The two bams are bio-replicates. Would this setting not affect results?

koljaLanger · 2020-09-05T13:58:46Z

It affects the results slightly, because PAQR would normally take advantage of the two samples being replicates. But it should not be an issue.

Please bear in mind, though, that running KAPAC requires the comparison across conditions - in your case you would need a reference condition to compare against.

Sylarair · 2020-09-06T08:47:55Z

Got it, thanks!

Sylarair · 2020-09-06T15:14:01Z

Thanks for your suggestions because I get the PAQR results successfully.

However, facing so many results, I am confused about the difference of 3 files (relative_usages.filtered.tsv, relative_usages.tsv, and relative_usages.relPos_per_pA.out) and wonder which one is the proximal polyA site usage. Could you please explain it to me for I have not found their interpretation in the manual. Many thanks!

Qin

koljaLanger · 2020-09-06T15:41:40Z

Hi Qin

there are two files which are worth looking into if you want to obtain the proximal poly(A) site usage:

relative_usages.filtered.tsv
tandem_pas_expressions.rpm.filtered.tsv

Both files contain one poly(A) site per line and poly(A) sites are grouped by exons and sorted proximal to distal -> so each line that has a new entry in the exon column indicates a proximal poly(A) site.
The last n columns (n = number of samples; so in your case, the last 2 columns) show the relative usage of the the poly(A) site (in the case of relative_usages.filtered.tsv) or the library size normalized expression (in the case of tandem_pas_expressions.rpm.filtered.tsv).

Hope that helps

Best
Ralf

Sylarair · 2020-09-07T05:06:36Z

Thanks for your answer which clears my doubts. There is another 2 question which interest me are: 1. what is used to filter 'relative_usages.tsv' to get 'relative_usages.filtered.tsv'? 2. When applying PAQR to analyze forebrain E13.5 day polyA RNA-Seq data, I find that relative usage of proximal polyA sites in most of genes in relative_usages.filtered.tsv are very high (70~90). This is wired and let me wonder whether I am wrong in some step. Could you please give some comments? Thanks!

PAQR_PPAU.pdf

koljaLanger · 2020-09-08T21:19:06Z

The filtering simply selects exons with multiple poly(A) sites which have read support in all samples - so this can be considered more of a data cleaning step.
Regarding your proximal-distal usage distribution: it is hard to identify a particular reason for your results without further details. In general, cell from the brain have extended 3' ends - but I don't know if this still holds for embryo samples.
A good starting point would be to get some global statistics: how many sites were quantified? How does the expression distribution looks like? Do you have an expected distribution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when running PAQR step2 #19

Errors when running PAQR step2 #19

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020 •

edited

Loading

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 5, 2020 •

edited

Loading

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 5, 2020

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 6, 2020

Sylarair commented Sep 6, 2020 •

edited

Loading

koljaLanger commented Sep 6, 2020

Sylarair commented Sep 7, 2020 •

edited

Loading

koljaLanger commented Sep 8, 2020

Errors when running PAQR step2 #19

Errors when running PAQR step2 #19

Comments

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020 • edited Loading

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 5, 2020 • edited Loading

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020

Sylarair commented Sep 5, 2020

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 5, 2020

koljaLanger commented Sep 5, 2020

Sylarair commented Sep 6, 2020

Sylarair commented Sep 6, 2020 • edited Loading

koljaLanger commented Sep 6, 2020

Sylarair commented Sep 7, 2020 • edited Loading

koljaLanger commented Sep 8, 2020

Sylarair commented Sep 5, 2020 •

edited

Loading

Sylarair commented Sep 5, 2020 •

edited

Loading

Sylarair commented Sep 6, 2020 •

edited

Loading

Sylarair commented Sep 7, 2020 •

edited

Loading