Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further question about how to properly set up the sample relationships #14

Open
aleighbrown opened this issue Nov 27, 2019 · 3 comments

Comments

@aleighbrown
Copy link

A bit confused about the appropiate way to set up the samples in the config.yaml

Currently the config.yaml as provided when you download looks likes this:

#-------------------------------------------------------------------------------
# sample specific values:
# - name of samples per study
# - name of BAM file and condition per sample
#-------------------------------------------------------------------------------

HNRNPC_KD:
  samples: [ctl_rep1, ctl_rep2, HNRNPC_rep1, HNRNPC_rep2]

ctl_rep1: {bam: CTL_rep1, type: CNTRL}
ctl_rep2: {bam: CTL_rep2, type: CNTRL}
HNRNPC_rep1: {bam: KD_rep1, type: KD, control: ctl_rep1}
HNRNPC_rep2: {bam: KD_rep2, type: KD, control: ctl_rep2}

Are the HNRNPC_rep1 being directly compared to ctl_rep1?
What if my samples don't have such a clear cut this control should be compared to this case relationship, eg, I've done 3 biological replicates in each condition but they're not what I would call directly matched.

If my samples are MUT1,MUT2,MUT3, WT1,WT2,WT3 how would it make a difference in the final analysis if matter if I did set up the relationship as

MUT1: {bam: MUT1, type: MUT, control: WT1}
MUT2: {bam: MUT2, type: MUT, control: WT2}

vs

MUT1: {bam: MUT1, type: MUT, control: WT2}
MUT2: {bam: MUT2, type: MUT, control: WT3}

What if my sample sizes for conditions weren't matched, if I have 5 in one condition and 8 in another for example?

Thanks!

@koljaLanger
Copy link
Member

PAQR just runs condition wise, so in the inference of poly(A) site usage it does not make a difference what you put as control for the mutation samples.
However, the KAPAC step needs a reference sample to compare against; so results may change depending on which of the wild type sample you use as control. That being said, it is not necessary that you have matching samples of treatment vs control.

Probably, it would even be of interest for us if you change the control samples in two independent runs and get to completely different results. We'd expect that results should be stable towards this type of alteration.

Hope this helps for now.

Best,
Ralf

@SamBryce-Smith
Copy link

Just to tag onto this issue, it appears that the sample relationships defined in the config can affect whether samples pass the mTIN > 70 filter in part_one.Snakefile. In the case below, only pairs of samples that both have mTIN > 70 are considered valid, despite many in my HOM condition having > 70 mTIN.

As I've defined the sample relationships here, only the HOM-3 : WT-3 pairing passes the filter.

bias.TIN.median_per_sample.tsv
sample median_TIN
IP-WT-D14-1 60.078931
IP-WT-D14-2 63.014136
IP-WT-D14-3 72.905163
IP-WT-D14-4 70.372223
IP-HOM-D14-1 71.532313
IP-HOM-D14-2 70.307760
IP-HOM-D14-3 74.176115
IP-HOM-D14-4 68.654441
IP-HOM-D14-5 70.127562
IP-HOM-D14-6 70.768449

(config.yaml)
IP-WT-D14-1: {bam: IP-WT-D14-1_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-2: {bam: IP-WT-D14-2_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-3: {bam: IP-WT-D14-3_unique_rg_fixed, type: IP_D14_CNTRL}
IP-WT-D14-4: {bam: IP-WT-D14-4_unique_rg_fixed, type: IP_D14_CNTRL}
IP-HOM-D14-1: {bam: IP-HOM-D14-1_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-1}
IP-HOM-D14-2: {bam: IP-HOM-D14-2_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-2}
IP-HOM-D14-3: {bam: IP-HOM-D14-3_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-3}
IP-HOM-D14-4: {bam: IP-HOM-D14-4_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-4}
IP-HOM-D14-5: {bam: IP-HOM-D14-5_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-1}
IP-HOM-D14-6: {bam: IP-HOM-D14-6_unique_rg_fixed, type: IP_D14_HOM, control: IP-WT-D14-2}

At this stage, our main interest in this data-set is the inference of poly(A) site usage. Following on from what you've said, would you say it's acceptable to change the control samples for my HOM set so they point to the WT-3 & WT-4 samples (the WT samples are biological replicates)?

Thanks,
Sam

@koljaLanger
Copy link
Member

Hi Sam
yes, I think that is what I would suggest to do in this case.

Best
Ralf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants