04- Explore the mapping scores for Wilms tumor -06 #835

maud-p · 2024-10-21T08:33:14Z

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

This PR is linked to one comment in the PR#828 regarding the correct labelling of endothelial cells:

What is the goal of this pull request?

Here, I wanted to explore the mapping score of label transfer of the predicted compartment from the fetal kidney reference.
I wanted to check how realable are the endothelial and immune annotations that we used as normal reference in infercnv.

It seems that the majority of endothelial and immune cells map the reference with a high mapping.score > 0.85.

I might include a filtering step in the infercnv.R script to filter out immune and or endothelial cells we used for the reference if they have a bad mapping.score.

Briefly describe the general approach you took to achieve this goal.

I just added few density abd boxplots to check the distribution of the mapping.score for the compartments (fetal nephron, strona, endothelial and immune) from the label transfer from the fetal kidney reference.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes, I will include a filtering step in the infercnv.R script to filter out immune and or endothelial cells we used for the reference if they have a bad mapping.score.

What types of results does your code produce (e.g., table, figure)?

One notebook.

What is your summary of the results?

It might be worth filtering out cells with a mapping.score < 0.85 while running 06_infercnv.R

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

sjspielman

Overall, these look like good additions to explore scores a bit more! The main thing I think is missing is a clearer understanding of how you ended up choosing 0.85. Right now, it appears that this was just a visual assessment? On one hand, I don't want to suggest too much more work for you here on this PR since we're coming up on deadlines, but I do think more evidence supporting this specific choice would be good and how it was chosen. For example, why not 0.5? Why not 0.95? Some justification is helpful here.

Perhaps this is a good middle ground as something that can at least explore whether this threshold is reasonable:

You can make knit a few versions of this notebook, specifying a few different thresholds. You can knit them all to create HTMLs with a custom name that includes the threshold in the file name. I recommend at least these thresholds, maybe 0.5, 0.85, and 0.95? Then, we can compare a bit more clearly and choose the ideal threshold. This will allow you to make use of the param, as well.
As part of this, you'll want to update text in the notebook to indicate that you are also exploring the potential effects of score thresholds, rather than saying "we chose this threshold and will continue." This will probably involve updating the intro and conclusion text, primarily. In the conclusion, you don't need to conclude which threshold to use, since each notebook will be using a different threshold. You can formally document which threshold is used in the scripts used in next steps.
- Also, you probably want to remove the text at the bottom of the notebook saying how many of each type of cell were found in the 5 samples chosen, since again this will be different for thresholds across notebooks. Instead, the notebook/README.md already explains which samples were chosen, so just keep that text. Since these numbers are also likely to shift somewhat (but they should not change too much!) with the forthcoming code changes we are working on to do annotation without Azimuth functions and because of small changes that may occur with data releases, I recommend just writing down the sample IDs without the specific cell counts. Instead, you can just say that these were chosen because they are majority kidney with a good amount of immune + endothelial.
- You should also update notebook/README.md to briefly explain that part of this notebook is to explore thresholds.
I also recommend adding plots (this should be super quick!): Let's make the marker gene plots twice: using all annotations (which is what you currently do), and then make a second version of these plots with only cells passing the threshold. You can create a second data frame for this, and then just plot those results using your do_Feature_mean function. We'd hope to see stronger signal for marker genes after filtering, and with a couple rendered notebooks,

# code for second data frame to plot
cell_type_df_pass <- cell_type_df  |>
  dplyr::filter(pass_mapping_QC)

It might be worth also visually exploring with UMAP where cells are colored by compartment, and you'd make 2 versions for each dataset: with all cells, and only with cells that pass the score threshold. But, I would not make these plots unless you think the marker gene plots do not provide sufficient evidence to pick a threshold among the ones you explore. In case you do decide to do this, you would want to pull out the UMAP coordinates in the code that makes cell_type_df and plot using ggplot() + geom_point().

analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd

sjspielman · 2024-10-22T15:24:11Z

analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd

A general comment here: Can you add text above plots stating that the line is drawn at the threshold being explored in the notebook?

analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

maud-p · 2024-10-22T21:08:58Z

Hi @sjspielman ,

thank you so much for staying active on the revisions while being on a workshop!

I also plotted cells that do not pass the threshold, as I have the impression it is easiest sometimes to evaluate than cell passing the threshold! I guess a matter of taste 😄
I realized that the stroma compartment often has a poor mapping score. This is for me an indication that these cells might be cancer cells and not normal stromal cells.
I think the threshold can be used to select normal cells for which we have a high confidency, but I wouldn't use it to filter out all cells below the threshold.

Thank you!

maud-p · 2024-10-22T21:34:25Z

regarding the choice of threshold, I think 0.5 is too low, almost all cells having a higher mapping.score and 0.95 is too high, so few cells pass the threshold.

What do you think from 0.75 and 0.85, I cannot really decide 🤔
Thank you!

sjspielman

Hi @maud-p, sorry I wasn't able to review more last week! I have a bit of feedback for this PR, but we can get this in shortly!

While looking at the heatmaps, I realized something was strange with the legends which are appearing as discrete when they should be continuous - turns out there's a bug in line 164. This may also be an issue in other notebooks that use this plotting strategy too, but definitely do not worry about that!!! Let's just fix it here:

# current line 164
guides(fill=guide_legend(title=paste0(feature)))

# but it should be using guide_colourbar
guides(fill=guide_colourbar(title=paste0(feature)))

It would also be good to make the titles in these heatmap plots a little smaller since it runs over the page currently. Can you update the theme lines here to include title = element_text(size = rel(0.75)) (FYI rel(0.75) means "0.75 times (aka, relative to) the default size")? This should help the titles fit. You may need to change the 0.75 number a bit, but I think it should be close.

I think either threshold 0.75 or 0.85 will be fine here; it's just important to note which you choose and why! It's also fine to say that both looked good, so you just choose the more (or less) stringent one. Since you've already run next steps of code with 0.85, that should be fine to keep. Please just add a quick sentence or two to the README to state which one you are choosing. It would be helpful to also include the concluding notes you made in this comment in the README, too #835 (comment).

maud-p · 2024-10-28T19:23:48Z

Hi @maud-p, sorry I wasn't able to review more last week! I have a bit of feedback for this PR, but we can get this in shortly!

While looking at the heatmaps, I realized something was strange with the legends which are appearing as discrete when they should be continuous - turns out there's a bug in line 164. This may also be an issue in other notebooks that use this plotting strategy too, but definitely do not worry about that!!! Let's just fix it here:
# current line 164
guides(fill=guide_legend(title=paste0(feature)))

# but it should be using guide_colourbar
guides(fill=guide_colourbar(title=paste0(feature)))

Hi @sjspielman , good catch thank you! I was wondering why the legends were so, but didn't find the error! Thank you!

sjspielman · 2024-10-28T20:15:52Z

analyses/cell-type-wilms-tumor-06/notebook/README.md

@@ -36,6 +36,19 @@ The next step in analysis is to identify tumor vs. normal cells.
 - `04_annotation_Across_Samples_exploration.html` is the output of the [`04_annotation_Across_Samples_exploration.Rmd`](../notebook/04_annotation_Across_Samples_exploration.Rmd) notebook. 
 In brief, we explored the label transfer results across all samples in the Wilms tumor dataset SCPCP000006 in order to identify a few samples that we can begin next analysis steps with.

+One way to evaluate the label transfer is to look at the mapping score for each label being transfered, which more or less correspond to the certainty for a label transfer to be _TRUE_. 
+We render the notebook with different thresholds for the mapping score and evaluate the impact of filtering out cells with a mapping score below 0.5, 0.75, 0.85 and 0.95.


I wanted to point something out here from the Azimuth docs: https://azimuth.hubmapconsortium.org/

I had been under the impression that the scores we were working with are what they are calling prediction scores, not mapping scores, but now I'm wondering whether I actually had a reason to think this. I only just now realized this difference in how I am thinking about this (so sorry!!), even though obviously you had been writing "mapping score" all along! Do you know for sure which scores we are using here? That may influence interpretation, but not the analysis itself.

Oh you are pointing out a good point. From what I read from the Azimuth, :
both preduction and mapping scores exist and are the cell-level Metrics:

Prediction scores: Cell prediction scores range from 0 to 1 and reflect the confidence associated with each annotation. Cells with high-confidence annotations (for example, prediction scores > 0.75) reflect predictions that are supported by mulitple consistent anchors. Prediction scores can be visualized on the Feature Plots tab, or downloaded on the Download Results tab. The prediction depends on the specific annotation for each cell. Therefore, if you are mapping cells at multiple levels of resolution (for example level 1/2/3 annotations in the Human PBMC reference), each level will be associated with a different prediction score.

Mapping scores: This value from 0 to 1 reflects confidence that this cell is well represented by the reference. The “mapping.score” column is available to plot in the Feature Plots tab, and is provided in the download TSV file. The mapping score is independent of a specific annotation, is calculated using the MappingScore function in Seurat, and reflects how well the unique structure of a cell’s local neighborhood is preserved during reference mapping.

I am using the predicted.score, in fact I shouldn't refer it as mapping, you are right.

I wasn't aware about the 2 metrics! thanks!

Ah fantastic, I think the predicted.score is definitely want we want to be using!! So, let's just change the text to say prediction instead of mapping, but otherwise this is good!

…s_Samples_exploration.html

…s_Samples_exploration_mappingscore_threshold_0.5.html

…s_Samples_exploration_mappingscore_threshold_0.75.html

…s_Samples_exploration_mappingscore_threshold_0.85.html

…s_Samples_exploration_mappingscore_threshold_0.95.html

…ing --> predicted

sjspielman · 2024-10-29T14:50:46Z

@maud-p Is this one ready for me to have another look yet? No problem if not, just checking in :)

maud-p · 2024-10-29T14:53:23Z

yes sorry, both of the PR should be ready 😄 I'll ask for review in a second!

sjspielman

Looks good, let's get this in!!

maud-p · 2024-10-29T15:05:18Z

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

sjspielman · 2024-10-29T15:06:41Z

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

I'm not sure what you mean here? Everything you currently have is fine! In #828 (which I'm reviewing now), I resolved the conflict with the main branch, so that PR can stay as it is.

maud-p · 2024-10-29T15:07:52Z

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

I'm not sure what you mean here? Everything you currently have is fine! In #828 (which I'm reviewing now), I resolved the conflict with the main branch, so that PR can stay as it is.

great thank you! then I just let it as it is 😄

Add files via upload

6f914c1

maud-p requested a review from jaclyn-taroni as a code owner October 21, 2024 08:33

maud-p changed the title ~~Explore the mapping scores~~ 04- Explore the mapping scores for Wilms tumor -06 Oct 21, 2024

jaclyn-taroni requested review from sjspielman and removed request for jaclyn-taroni October 21, 2024 12:34

sjspielman reviewed Oct 22, 2024

View reviewed changes

maud-p and others added 7 commits October 22, 2024 20:23

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

f111dbc

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

0e4b03a

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

9a97a33

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

baaada9

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

6f58dcf

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

be700ca

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

f5bd6f3

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

maud-p added 2 commits October 22, 2024 21:32

update threshold exploration

13a97a4

update threshold exploration

c01c2e6

sjspielman reviewed Oct 28, 2024

View reviewed changes

maud-p added 3 commits October 28, 2024 19:27

Update README.md file

94e4881

update notebook and reports

0a27a39

Update README.md file

6b53a61

sjspielman reviewed Oct 28, 2024

View reviewed changes

maud-p added 7 commits October 28, 2024 21:53

Update README.md

ef48739

Delete analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

c542128

…s_Samples_exploration.html

Delete analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

1437b36

…s_Samples_exploration_mappingscore_threshold_0.5.html

Delete analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

156f3ab

…s_Samples_exploration_mappingscore_threshold_0.75.html

Delete analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

226d44b

…s_Samples_exploration_mappingscore_threshold_0.85.html

Delete analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Acros…

85c8406

…s_Samples_exploration_mappingscore_threshold_0.95.html

update mapping_score --> predicted.score

a82584b

maud-p added 3 commits October 28, 2024 21:11

update mapping --> predicted.score

c1e6465

update param argument mapping --> predicted.score

e98126d

update params for "04_annotation_Across_Samples_exploration.Rmd" mapp…

2cbb62b

…ing --> predicted

maud-p requested a review from sjspielman October 29, 2024 14:53

sjspielman approved these changes Oct 29, 2024

View reviewed changes

Merge branch 'main' into 04_explore_mapping_score

abe48ec

sjspielman merged commit 6116e00 into AlexsLemonade:main Oct 29, 2024
3 checks passed

maud-p deleted the 04_explore_mapping_score branch January 2, 2025 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04- Explore the mapping scores for Wilms tumor -06 #835

04- Explore the mapping scores for Wilms tumor -06 #835

maud-p commented Oct 21, 2024 •

edited

Loading

sjspielman left a comment

sjspielman Oct 22, 2024

maud-p commented Oct 22, 2024

maud-p commented Oct 22, 2024

sjspielman left a comment

maud-p commented Oct 28, 2024

sjspielman Oct 28, 2024 •

edited

Loading

maud-p Oct 28, 2024

maud-p Oct 28, 2024

sjspielman Oct 28, 2024

sjspielman commented Oct 29, 2024

maud-p commented Oct 29, 2024

sjspielman left a comment

maud-p commented Oct 29, 2024

sjspielman commented Oct 29, 2024

maud-p commented Oct 29, 2024

04- Explore the mapping scores for Wilms tumor -06 #835

04- Explore the mapping scores for Wilms tumor -06 #835

Conversation

maud-p commented Oct 21, 2024 • edited Loading

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

Briefly describe the general approach you took to achieve this goal.

If known, do you anticipate filing additional pull requests to complete this analysis module?

What types of results does your code produce (e.g., table, figure)?

What is your summary of the results?

Author checklists

sjspielman left a comment

Choose a reason for hiding this comment

sjspielman Oct 22, 2024

Choose a reason for hiding this comment

maud-p commented Oct 22, 2024

maud-p commented Oct 22, 2024

sjspielman left a comment

Choose a reason for hiding this comment

maud-p commented Oct 28, 2024

sjspielman Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

maud-p Oct 28, 2024

Choose a reason for hiding this comment

maud-p Oct 28, 2024

Choose a reason for hiding this comment

sjspielman Oct 28, 2024

Choose a reason for hiding this comment

sjspielman commented Oct 29, 2024

maud-p commented Oct 29, 2024

sjspielman left a comment

Choose a reason for hiding this comment

maud-p commented Oct 29, 2024

sjspielman commented Oct 29, 2024

maud-p commented Oct 29, 2024

maud-p commented Oct 21, 2024 •

edited

Loading

sjspielman Oct 28, 2024 •

edited

Loading