Ag-866/Ag-1227 Process SRM DE data #90

jaclynbeck-sage · 2023-09-30T01:00:47Z

This PR pre-processes the SRM DE data and adds it to the gene_info and proteomics_distribution transforms, and adds an entry for SRM data to be processed (w/ no custom transform needed) in the config, the same way LFQ and TMT data is processed.

For pre-processing: The notebook that does this pre-processing is here. The notebook combines two SRM proteomics DE files into one dataframe and reformats them to mimic the LFQ and TMT file formats. The input data only contains gene names, so Uniprot and Ensembl IDs are queried for each gene and added. p-values in the data are corrected for multiple testing, then columns are re-named and re-ordered to match LFQ and TMT data. At that point, it was really simple to add this data to the transforms because it's identical in format to the other two proteomics files.

I confirmed that the new proteomics_distribution_data.json has a single JSON entry added for SRM data, and that the only differences in the new gene_info.json file are that several genes have flipped is_any_protein_changed_in_ad_brain and protein_brain_change_studied from false to true due to significance in the SRM data for that gene.

… into Agora

…pipeline

…to AG-866_reprocess_srm_data

… data to gene_info, proteomics, and proteomics_distribution transforms

…nt branch

JessterB · 2023-10-06T16:39:04Z

@jaclynbeck-sage We do actually use those ci values in the GCT proteomics circle overlay plot. If we can't get actual ci values for this data somehow, setting them to the l2fc value won't work.

Can you think of a way to generate or otherwise acquire ci values for this data?
If not, is setting the ci values to 0 an option?

jaclynbeck-sage · 2023-10-06T21:08:04Z

Good catch. We don't have the raw data but apparently the CI can be calculated from the p-value so I'll do that! Sorry, I didn't look into that originally.

Also I just realized I did not add SRM to our proteomics tests so I will add those as well.

tervals and add more documentation

…d instead added the mapping file to Synapse

jaclynbeck-sage · 2023-11-02T00:16:51Z

@JessterB I am still working on getting SRM into the testing suite, but I've updated the SRM data to have confidence intervals. This turned out to be non-trivial for an ANOVA on multiple groups, and I ended up having to partially re-process the raw data. The log2-fold-change values and p-values are identical to the data in the original DE tables, but now there are confidence intervals too.

This brings up a thought -- how much of this processing actually needs to happen in this pre-processing notebook and how much should instead be in a separate repository or gist, like the LFQ and TMT processing? I don't know how much it matters that we're doing (partial) DE analysis inside a notebook in the ADT repository.

The only part of this process that is specifically ADT pipeline-related that actually needs to be in a preprocessing notebook is the UniProt/Ensembl ID lookup (due to potential failure/hanging when making external API requests). The rest is just data rearrangement and math.

I could see, for example, the UniProt -> Ensembl ID mapping being done in this notebook, and a separate repo/gist re-processes the DE data, and then a transform combines the two. On the other hand, leaving it the way it currently works requires no extra work and no extra transforms.

For reference, a human-readable version of the notebook is here.

jaclynbeck-sage · 2023-11-03T20:50:50Z

@JessterB ready for re-review! SRM has been added to the integration testing, and confidence intervals are real values.

As per our discussion yesterday, we'll be leaving all the data processing as-is in the notebook until/unless we need to restructure stuff.

JessterB

lgtm!

jaclynbeck-sage added 7 commits January 9, 2023 15:45

Added draft notebook to reprocess SRM data and add Ensembl IDs to it

d7f9688

Renamed SRM processing file to clarify it's for correlation data

0a91005

Added notebook to process SRM differential expression data for ingest…

f16f97c

… into Agora

Merged dev into this branch to pick up all the recent changes to the …

2d53ba5

…pipeline

Merge branch 'dev' of github.com:Sage-Bionetworks/agora-data-tools in…

6df2a4d

…to AG-866_reprocess_srm_data

Updated SRM DE processing script for newest ADT changes, added SRM DE…

61cb0c7

… data to gene_info, proteomics, and proteomics_distribution transforms

Removed SRM correlation notebook from this branch to put in a differe…

7f2a300

…nt branch

jaclynbeck-sage marked this pull request as ready for review September 30, 2023 02:09

jaclynbeck-sage requested a review from JessterB September 30, 2023 02:09

jaclynbeck-sage added 2 commits November 1, 2023 16:01

Rewrite of SRM differential expression notebook to get confidence in

425c00a

tervals and add more documentation

Removed reliance on a locally downloaded file for UniProt mapping, an…

620c094

…d instead added the mapping file to Synapse

jaclynbeck-sage added 2 commits November 1, 2023 17:22

Increased the version of SRM data in the config files to version 3

72cfdb5

Added SRM data to proteomics distribution data testing

38697f3

JessterB approved these changes Nov 3, 2023

View reviewed changes

JessterB merged commit c7d64fa into dev Nov 3, 2023
7 checks passed

JessterB deleted the AG-866_reprocess_srm_data branch November 3, 2023 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ag-866/Ag-1227 Process SRM DE data #90

Ag-866/Ag-1227 Process SRM DE data #90

jaclynbeck-sage commented Sep 30, 2023 •

edited

Loading

JessterB commented Oct 6, 2023

jaclynbeck-sage commented Oct 6, 2023

jaclynbeck-sage commented Nov 2, 2023

jaclynbeck-sage commented Nov 3, 2023

JessterB left a comment

Ag-866/Ag-1227 Process SRM DE data #90

Ag-866/Ag-1227 Process SRM DE data #90

Conversation

jaclynbeck-sage commented Sep 30, 2023 • edited Loading

JessterB commented Oct 6, 2023

jaclynbeck-sage commented Oct 6, 2023

jaclynbeck-sage commented Nov 2, 2023

jaclynbeck-sage commented Nov 3, 2023

JessterB left a comment

Choose a reason for hiding this comment

jaclynbeck-sage commented Sep 30, 2023 •

edited

Loading