Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get the CCF and # of SNVs in each subclone #45

Open
philsteinberg opened this issue Aug 4, 2022 · 4 comments
Open

Get the CCF and # of SNVs in each subclone #45

philsteinberg opened this issue Aug 4, 2022 · 4 comments
Labels
Task A step of progress
Milestone

Comments

@philsteinberg
Copy link
Contributor

I would suggest that in addition to the number of subclones per patient, also get the CCF and # of SNVs in each subclone for later use. It's ok if you want to open an issue and do it in another PR though.

Originally posted by @lydiayliu in #44 (comment)

@philsteinberg
Copy link
Contributor Author

I tried to extract snv and ccf from the PyClone-Vi ss output.
Output file: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-pyclone-vi/output/<date>pyclone_ss_ccf_snv_per_subclones.tsv

I have not yet extracted snv and ccf from the PyClone-Vi ms output

In the documentation for DPClust it says that *_bestClusterInfo.txt contains the CCF, but I only see columns cluster.no, location and no.of.mutations? Additionally, that file's cluster.no does not correspond with the shared/unique id across samples (I think there is another file that does). How should I best address these issues?

Also, not sure about how to group/plot the data for snv and ccf. I think visually it would be most helpful to take one sample and plot the variability in num snv per cluster across seeds. That might not be as generalizable, but scaling that up to all samples could be kind of messy. Thoughts?

@lydiayliu
Copy link
Collaborator

Sorry I had meetings all morning

I tried to extract snv and ccf from the PyClone-Vi ss output.

Looks good!

DPClust

cluster.no      location        no.of.mutations
1       1.99373776908023        3654
2       1.40195694716243        931
3       1.28923679060665        204       0.95812133072407        153

So here there are 3 clusters, the location is the CCF. So there is something with this reconstruction that the CCF of all clusters is >1 (actually closer to 2). That means that the ploidy of the sample is wrong. I have to look into this... But don't worry about it for now

Additionally, that file's cluster.no does not correspond with the shared/unique id across samples (I think there is another file that does). How should I best address these issues?

Sorry I'm not sure what you mean here. DPClust cluster number is different across their files (that's why I usually renumber everything), but usually you can match the clusters by number of mutations. These are single sample runs, what do you mean by "shared / unique id across samples"?

Also, not sure about how to group/plot the data for snv and ccf. I think visually it would be most helpful to take one sample and plot the variability in num snv per cluster across seeds. That might not be as generalizable, but scaling that up to all samples could be kind of messy. Thoughts?

I've also been thinking about this. I feel like the option is between presenting data from every single seed for every sample (for example, x-axis as CCF and size of the dot as # of SNVs, y-axis is seeds 1-10).

Or we can present summary figures for each sample across the seeds, but it is a little tricky with the differing number of subclones called across seeds. But for example for number of SNVs in each cluster, for each sample we have CCF in the x-axis and number of SNVs in the y-axis. Then each "cluster mean" across seeds (mean of SNV and CCF of cluster 1, for example) is a dot, with horizontal error bars for CCF sd and vertical error bars for number of SNV sd

@philsteinberg
Copy link
Contributor Author

philsteinberg commented Aug 8, 2022

Thanks for all the feedback and comments. Regarding Q2)

cluster.no 1,2,3,4 have a different id?:
Screen Shot 2022-08-08 at 10 54 23 AM

Screen Shot 2022-08-08 at 10 52 17 AM

I will try the things you suggested after I fix the other PR stuff.

@lydiayliu
Copy link
Collaborator

Right, ignore the optimainfo file? XD I don't think there's anything you need from it

@lydiayliu lydiayliu added the Task A step of progress label Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Task A step of progress
Projects
None yet
Development

No branches or pull requests

2 participants