Get the CCF and # of SNVs in each subclone #45

philsteinberg · 2022-08-04T18:29:33Z

I would suggest that in addition to the number of subclones per patient, also get the CCF and # of SNVs in each subclone for later use. It's ok if you want to open an issue and do it in another PR though.

Originally posted by @lydiayliu in #44 (comment)

philsteinberg · 2022-08-08T06:31:35Z

I tried to extract snv and ccf from the PyClone-Vi ss output.
Output file: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-pyclone-vi/output/<date>pyclone_ss_ccf_snv_per_subclones.tsv

I have not yet extracted snv and ccf from the PyClone-Vi ms output

In the documentation for DPClust it says that *_bestClusterInfo.txt contains the CCF, but I only see columns cluster.no, location and no.of.mutations? Additionally, that file's cluster.no does not correspond with the shared/unique id across samples (I think there is another file that does). How should I best address these issues?

Also, not sure about how to group/plot the data for snv and ccf. I think visually it would be most helpful to take one sample and plot the variability in num snv per cluster across seeds. That might not be as generalizable, but scaling that up to all samples could be kind of messy. Thoughts?

lydiayliu · 2022-08-08T16:39:14Z

Sorry I had meetings all morning

I tried to extract snv and ccf from the PyClone-Vi ss output.

Looks good!

DPClust

cluster.no      location        no.of.mutations
1       1.99373776908023        3654
2       1.40195694716243        931
3       1.28923679060665        204       0.95812133072407        153

So here there are 3 clusters, the location is the CCF. So there is something with this reconstruction that the CCF of all clusters is >1 (actually closer to 2). That means that the ploidy of the sample is wrong. I have to look into this... But don't worry about it for now

Additionally, that file's cluster.no does not correspond with the shared/unique id across samples (I think there is another file that does). How should I best address these issues?

Sorry I'm not sure what you mean here. DPClust cluster number is different across their files (that's why I usually renumber everything), but usually you can match the clusters by number of mutations. These are single sample runs, what do you mean by "shared / unique id across samples"?

Also, not sure about how to group/plot the data for snv and ccf. I think visually it would be most helpful to take one sample and plot the variability in num snv per cluster across seeds. That might not be as generalizable, but scaling that up to all samples could be kind of messy. Thoughts?

I've also been thinking about this. I feel like the option is between presenting data from every single seed for every sample (for example, x-axis as CCF and size of the dot as # of SNVs, y-axis is seeds 1-10).

Or we can present summary figures for each sample across the seeds, but it is a little tricky with the differing number of subclones called across seeds. But for example for number of SNVs in each cluster, for each sample we have CCF in the x-axis and number of SNVs in the y-axis. Then each "cluster mean" across seeds (mean of SNV and CCF of cluster 1, for example) is a dot, with horizontal error bars for CCF sd and vertical error bars for number of SNV sd

philsteinberg · 2022-08-08T17:54:08Z

Thanks for all the feedback and comments. Regarding Q2)

cluster.no 1,2,3,4 have a different id?:

I will try the things you suggested after I fix the other PR stuff.

lydiayliu · 2022-08-08T17:55:17Z

Right, ignore the optimainfo file? XD I don't think there's anything you need from it

philsteinberg added this to the Seeds! milestone Aug 4, 2022

This was referenced Aug 4, 2022

Visualization ideas for poster and presentation #43

Closed

Phils parse pipeline output #44

Merged

Phils get snv ccf #49

Closed

lydiayliu added the Task A step of progress label Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get the CCF and # of SNVs in each subclone #45

Get the CCF and # of SNVs in each subclone #45

philsteinberg commented Aug 4, 2022

philsteinberg commented Aug 8, 2022

lydiayliu commented Aug 8, 2022

philsteinberg commented Aug 8, 2022 •

edited

Loading

lydiayliu commented Aug 8, 2022

Get the CCF and # of SNVs in each subclone #45

Get the CCF and # of SNVs in each subclone #45

Comments

philsteinberg commented Aug 4, 2022

philsteinberg commented Aug 8, 2022

lydiayliu commented Aug 8, 2022

philsteinberg commented Aug 8, 2022 • edited Loading

lydiayliu commented Aug 8, 2022

philsteinberg commented Aug 8, 2022 •

edited

Loading