Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIA_SpecLib_Quant_Phospho_diaPASEF Sitereport #1860

Open
dh2305 opened this issue Nov 3, 2024 · 17 comments
Open

DIA_SpecLib_Quant_Phospho_diaPASEF Sitereport #1860

dh2305 opened this issue Nov 3, 2024 · 17 comments
Assignees

Comments

@dh2305
Copy link

dh2305 commented Nov 3, 2024

Hi,

is there any way to get a sitereport table analog to the LFQphospho workflow inside Fragpipe or is this not possible as quant is carried out by DIA-NN and not IonQuant

Thank you

@fcyu
Copy link
Member

fcyu commented Nov 3, 2024

We are working on it.

Stay tuned.

Best,

Fengchao

@fcyu fcyu transferred this issue from Nesvilab/diaTracer Nov 3, 2024
@fcyu fcyu self-assigned this Nov 3, 2024
@dh2305
Copy link
Author

dh2305 commented Nov 3, 2024

Thank you Fengchao!

I have found 2 workarounds so far:
I) By using the library.tsv created by the Fragpipe workflow inside of DIANN 1.9.2 standalone you get the phospho90 and 99 reports as usual for DIA-NN (problem not similar to the DDA cutoffs and not changeable (e.g. 75%))
However phosphopeptide ID (mean over my replicates) numbers suffer a little bite (14400 compared to 17000) probably because of increased stringency in newer DIA-NN and because you cannot match all parameters for the Fragpipe workflow to DIA-NN 100%.
Site IDs with this combination workflow result inside the sites90 table in 8100 sites(mean over my replicates) compared to 7200 just using DIA-NN 1.9.2 alone. However with DIA-NN alone I have only used M(Ox) as a var. mod next to ST while the fragpipe workflow uses term N-Ac. as well

  1. By using the mssitereport python script you can extract site reports from diann report.tsv.
    However installing this package was time consuming to due various Linux dependencies you have to mimic inside windows. Furthermore outputs lack many descriptions from the fasta so you have to match the protein IDs to get them back (not a big problem).
    mssitereports delivered 13500 sites (mean through replicates) with the PTM confidence threshold cut off of 1 % (I do not know if the diatracer workflow uses PTMprophet to print the site confidence or DIA-NN does it hile quantification). While 1% in DIANN1.8.2 seems to be equal to 75% in MSfragger based on a benchmarking study I also did a cutoff of 75% and there I get 10670 sites. However mssitereports includes multiplicity whereas DIANN and fragpipe do not. sites with only 1 phospho where 7450.

With DDA of the same samples and LFQ phospho fragpipe (75%) cut off I get 10800 phospho sites.

So really not sure what to make from all these numbers

Best,
Dominik

@fcyu
Copy link
Member

fcyu commented Nov 3, 2024

It reminds me that there is another tool that can generate site reports for DIA-N and Spectronaut: https://github.com/tvpham/msproteomics

Maybe you can also try this one.

Best,

Fengchao

@fcyu
Copy link
Member

fcyu commented Nov 3, 2024

If you like, we can have a Zoom call to have a discussion to figure out the best approach to generate the site reports.

Best,

Fengchao

@dh2305
Copy link
Author

dh2305 commented Nov 3, 2024

This is the tool I mentioned.

Best,
Dominik

@dh2305
Copy link
Author

dh2305 commented Nov 3, 2024

I am on holiday next week but I will come back to you. I would love to have your input on the state-of-the-art of phosproproteomic data processing.

Best,
Dominik

@anesvi
Copy link
Collaborator

anesvi commented Nov 3, 2024

Please contact us directly by email - would be happy to discuss

@dh2305
Copy link
Author

dh2305 commented Nov 24, 2024

Please contact us directly by email - would be happy to discuss

Dear all,

I will be happy to schedule a meeting soonish once I am fully back in the lab -and hopefully be of help by sharing my extensive testing of phospho workflows and software pipelines utilizing our TIMStof machines.

A quick question regarding multiplicity -
the output of the regular DDA phospho workflow provides a site document (filtered for 75% best localisation probability): Does this output in any way or form include the concept of multiplicity and therefore multiple entries of the same site depending on the multiplicity/numeral phosphorylation of the corresponding phosphopeptide(s)? Or is the site table an output of individual sites only (such as DIA-NN) and not like MaxQuant (with the _1;_2;_3 versions of intensity)?

And another quick one regarding Y -
What is your stance on limiting phospho searches to ST and not including Y because of the natural occurrence and quite different biological/chemical properties to not run into MBR problems?
Are you in you experiences confident to include STY?
There still seems to be no clear preference in the community.

Thank you so much!
Dominik

@dh2305
Copy link
Author

dh2305 commented Nov 24, 2024

And another question, how exactly did you recieve the number of site IDs (phospho class I) in your latest diatracer preprint?

from the report.tsv or report_pr from DIA-NN? and how - as both are peptide centric?

thank you!

Dominik

@anesvi
Copy link
Collaborator

anesvi commented Nov 24, 2024

Site localization probabilities are not from DIA-NN, they are calculated by PTMProphet for each PSM while building the spectral library, and then propagated to the DIA-NN results, specifically to the precursor (_pr) file. That file was used for counting the sites in the diatracer paper.

I am not sure I understand your multiplicity question fully.

@dh2305
Copy link
Author

dh2305 commented Nov 24, 2024

thank you for your answer. How did you exactly derive the site numbers from these tables?

My prior 2 questions where
I) if the DDAlfqphospho fragpipe workflow, which provides a site table, accounts for multiplicity and the entries (rows) contain multiple entries of the same site depending on multiplicity (akin to MaxQuant or mssitereport) or not. And that the ID numbers from the site ID table (10000 e.g. from a µphos experiment I did with DDAPASEF acquisition) are fully unique sites (filtered for the 75% value of site probability (probably best site probability of the respective site) in the PTMprophet section)?

II) Are you confident with including Y in phospho searches compared to ST only and what is your take on the potential FDR problem by expanding the search space with an mod. of rarer occurrence and different biological/chemical behavior compared to ST ?

Thank you!
Dominik

@anesvi
Copy link
Collaborator

anesvi commented Nov 24, 2024

This GitHub issue has a title related to diaPASEF workflow. You are mixing so many things in one post. Maybe you should separate in separate issues.

In diatracer paper, we counted the number of sites in the _pr file with the localization probability above 0.75.

For DDA LFQ phospho reports, Fengchao, can you help, or point to where we describe the site level reports.

Regarding searching ST or STY, this has nothing to do with our tools. It is your choice regardless of the pipeline you use. Most people use STY, but yes only a relatively small percent of identified phosphorylations are on Y.

@anesvi
Copy link
Collaborator

anesvi commented Nov 24, 2024

And as far as FDR, yes, FDR for rate PTMs ( or noncanonical sequences when searching custom databases) will be overestimated. For any tool. We have ways to deal with it in FragPipe (group FDR, two-pass search), but it is a separate issue and we should not mix it here

@fcyu
Copy link
Member

fcyu commented Nov 24, 2024

I don't think we describe the DDA site-report anywhere, but the following is my answers to the questions.

Does this output in any way or form include the concept of multiplicity and therefore multiple entries of the same site depending on the multiplicity/numeral phosphorylation of the corresponding phosphopeptide(s)? Or is the site table an output of individual sites only (such as DIA-NN) and not like MaxQuant (with the _1;_2;_3 versions of intensity)?

They are single-site report. If multiple peptides have the same site, the intensity of the site is from top-N or MaxLFQ algorithm.

I) if the DDAlfqphospho fragpipe workflow, which provides a site table, accounts for multiplicity and the entries (rows) contain multiple entries of the same site depending on multiplicity (akin to MaxQuant or mssitereport) or not. And that the ID numbers from the site ID table (10000 e.g. from a µphos experiment I did with DDAPASEF acquisition) are fully unique sites (filtered for the 75% value of site probability (probably best site probability of the respective site) in the PTMprophet section)?

They are single, unique sites.

Best,

Fengchao

@dh2305
Copy link
Author

dh2305 commented Nov 24, 2024

Thank you for your very helpful answers, both to you Fengchao and Alexey.

I agree, this is a complex topic with many connected levels.

I) Regarding the DIAtracer preprint and also my diatracer phospho results: How exactly did you count the number of sites if this is a table on the peptide level? The same sites can turn up in multiple precursor variants (number of phosphorylation and number of charges)? This does not seem straight-forward as to why tools like mssitereports mentioned by Fengchao and myself are needed. Or am I missing something terribly obvious here?
As mentioned by Fengchao, I am looking forward to site report output tables akin to the DDA workflow. Thank you for your continuous amazing work.

II) Regarding Y, again agreed. As this is a topic independent of workflow I however highly appreciate the input here from the academic leaders close to the FDR implementation. Indeed, the STY implementation is most common, but I do not believe there is a consensus regarding the impact of including Y yet. To conclude, you would, with caveats, include Y as the phosho workflows inside Frapipe can (partially) account for its inherent implications?

Again, thank you and kind regards.

@fcyu
Copy link
Member

fcyu commented Nov 24, 2024

I) Regarding the DIAtracer preprint and also my diatracer phospho results: How exactly did you count the number of sites if this is a table on the peptide level?

Here are the scripts to generate the figures and results for the diaTracer paper: https://github.com/Nesvilab/diaTracer-manuscript. I believe there are the details about how we count the sites.

Best,

Fengchao

@KaiLiCn
Copy link
Member

KaiLiCn commented Nov 24, 2024

We counted the phospho sites on protein level. We have protein sites information in our result files. The phosphos sites were firstly filtered by localization score on psm level. We then counted the unique protein sites. Fengchao shared the scripts about how we processed it. Please find more details in the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants