Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

04- Explore the mapping scores for Wilms tumor -06 #835

Merged
merged 24 commits into from
Oct 29, 2024

Conversation

maud-p
Copy link
Contributor

@maud-p maud-p commented Oct 21, 2024

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

This PR is linked to one comment in the PR#828 regarding the correct labelling of endothelial cells:

What is the goal of this pull request?

Here, I wanted to explore the mapping score of label transfer of the predicted compartment from the fetal kidney reference.
I wanted to check how realable are the endothelial and immune annotations that we used as normal reference in infercnv.

It seems that the majority of endothelial and immune cells map the reference with a high mapping.score > 0.85.

I might include a filtering step in the infercnv.R script to filter out immune and or endothelial cells we used for the reference if they have a bad mapping.score.

Briefly describe the general approach you took to achieve this goal.

I just added few density abd boxplots to check the distribution of the mapping.score for the compartments (fetal nephron, strona, endothelial and immune) from the label transfer from the fetal kidney reference.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes, I will include a filtering step in the infercnv.R script to filter out immune and or endothelial cells we used for the reference if they have a bad mapping.score.

What types of results does your code produce (e.g., table, figure)?

One notebook.

What is your summary of the results?

It might be worth filtering out cells with a mapping.score < 0.85 while running 06_infercnv.R

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

@maud-p maud-p requested a review from jaclyn-taroni as a code owner October 21, 2024 08:33
@maud-p maud-p changed the title Explore the mapping scores 04- Explore the mapping scores for Wilms tumor -06 Oct 21, 2024
@jaclyn-taroni jaclyn-taroni requested review from sjspielman and removed request for jaclyn-taroni October 21, 2024 12:34
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, these look like good additions to explore scores a bit more! The main thing I think is missing is a clearer understanding of how you ended up choosing 0.85. Right now, it appears that this was just a visual assessment? On one hand, I don't want to suggest too much more work for you here on this PR since we're coming up on deadlines, but I do think more evidence supporting this specific choice would be good and how it was chosen. For example, why not 0.5? Why not 0.95? Some justification is helpful here.

Perhaps this is a good middle ground as something that can at least explore whether this threshold is reasonable:

  • You can make knit a few versions of this notebook, specifying a few different thresholds. You can knit them all to create HTMLs with a custom name that includes the threshold in the file name. I recommend at least these thresholds, maybe 0.5, 0.85, and 0.95? Then, we can compare a bit more clearly and choose the ideal threshold. This will allow you to make use of the param, as well.
  • As part of this, you'll want to update text in the notebook to indicate that you are also exploring the potential effects of score thresholds, rather than saying "we chose this threshold and will continue." This will probably involve updating the intro and conclusion text, primarily. In the conclusion, you don't need to conclude which threshold to use, since each notebook will be using a different threshold. You can formally document which threshold is used in the scripts used in next steps.
    • Also, you probably want to remove the text at the bottom of the notebook saying how many of each type of cell were found in the 5 samples chosen, since again this will be different for thresholds across notebooks. Instead, the notebook/README.md already explains which samples were chosen, so just keep that text. Since these numbers are also likely to shift somewhat (but they should not change too much!) with the forthcoming code changes we are working on to do annotation without Azimuth functions and because of small changes that may occur with data releases, I recommend just writing down the sample IDs without the specific cell counts. Instead, you can just say that these were chosen because they are majority kidney with a good amount of immune + endothelial.
    • You should also update notebook/README.md to briefly explain that part of this notebook is to explore thresholds.
  • I also recommend adding plots (this should be super quick!): Let's make the marker gene plots twice: using all annotations (which is what you currently do), and then make a second version of these plots with only cells passing the threshold. You can create a second data frame for this, and then just plot those results using your do_Feature_mean function. We'd hope to see stronger signal for marker genes after filtering, and with a couple rendered notebooks,
# code for second data frame to plot
cell_type_df_pass <- cell_type_df  |>
  dplyr::filter(pass_mapping_QC)
  • It might be worth also visually exploring with UMAP where cells are colored by compartment, and you'd make 2 versions for each dataset: with all cells, and only with cells that pass the score threshold. But, I would not make these plots unless you think the marker gene plots do not provide sufficient evidence to pick a threshold among the ones you explore. In case you do decide to do this, you would want to pull out the UMAP coordinates in the code that makes cell_type_df and plot using ggplot() + geom_point().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment here: Can you add text above plots stating that the line is drawn at the threshold being explored in the notebook?

maud-p and others added 7 commits October 22, 2024 20:23
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd

Co-authored-by: Stephanie Spielman <[email protected]>
@maud-p
Copy link
Contributor Author

maud-p commented Oct 22, 2024

Hi @sjspielman ,

thank you so much for staying active on the revisions while being on a workshop!

  • I also plotted cells that do not pass the threshold, as I have the impression it is easiest sometimes to evaluate than cell passing the threshold! I guess a matter of taste 😄

  • I realized that the stroma compartment often has a poor mapping score. This is for me an indication that these cells might be cancer cells and not normal stromal cells.

  • I think the threshold can be used to select normal cells for which we have a high confidency, but I wouldn't use it to filter out all cells below the threshold.

Thank you!

@maud-p
Copy link
Contributor Author

maud-p commented Oct 22, 2024

regarding the choice of threshold, I think 0.5 is too low, almost all cells having a higher mapping.score and 0.95 is too high, so few cells pass the threshold.

What do you think from 0.75 and 0.85, I cannot really decide 🤔
Thank you!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @maud-p, sorry I wasn't able to review more last week! I have a bit of feedback for this PR, but we can get this in shortly!

While looking at the heatmaps, I realized something was strange with the legends which are appearing as discrete when they should be continuous - turns out there's a bug in line 164. This may also be an issue in other notebooks that use this plotting strategy too, but definitely do not worry about that!!! Let's just fix it here:

# current line 164
guides(fill=guide_legend(title=paste0(feature)))

# but it should be using guide_colourbar
guides(fill=guide_colourbar(title=paste0(feature)))

It would also be good to make the titles in these heatmap plots a little smaller since it runs over the page currently. Can you update the theme lines here to include title = element_text(size = rel(0.75)) (FYI rel(0.75) means "0.75 times (aka, relative to) the default size")? This should help the titles fit. You may need to change the 0.75 number a bit, but I think it should be close.

I think either threshold 0.75 or 0.85 will be fine here; it's just important to note which you choose and why! It's also fine to say that both looked good, so you just choose the more (or less) stringent one. Since you've already run next steps of code with 0.85, that should be fine to keep. Please just add a quick sentence or two to the README to state which one you are choosing. It would be helpful to also include the concluding notes you made in this comment in the README, too #835 (comment).

@maud-p
Copy link
Contributor Author

maud-p commented Oct 28, 2024

Hi @maud-p, sorry I wasn't able to review more last week! I have a bit of feedback for this PR, but we can get this in shortly!

While looking at the heatmaps, I realized something was strange with the legends which are appearing as discrete when they should be continuous - turns out there's a bug in line 164. This may also be an issue in other notebooks that use this plotting strategy too, but definitely do not worry about that!!! Let's just fix it here:

# current line 164
guides(fill=guide_legend(title=paste0(feature)))

# but it should be using guide_colourbar
guides(fill=guide_colourbar(title=paste0(feature)))

Hi @sjspielman , good catch thank you! I was wondering why the legends were so, but didn't find the error! Thank you!

@@ -36,6 +36,19 @@ The next step in analysis is to identify tumor vs. normal cells.
- `04_annotation_Across_Samples_exploration.html` is the output of the [`04_annotation_Across_Samples_exploration.Rmd`](../notebook/04_annotation_Across_Samples_exploration.Rmd) notebook.
In brief, we explored the label transfer results across all samples in the Wilms tumor dataset SCPCP000006 in order to identify a few samples that we can begin next analysis steps with.

One way to evaluate the label transfer is to look at the mapping score for each label being transfered, which more or less correspond to the certainty for a label transfer to be _TRUE_.
We render the notebook with different thresholds for the mapping score and evaluate the impact of filtering out cells with a mapping score below 0.5, 0.75, 0.85 and 0.95.
Copy link
Member

@sjspielman sjspielman Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to point something out here from the Azimuth docs: https://azimuth.hubmapconsortium.org/

I had been under the impression that the scores we were working with are what they are calling prediction scores, not mapping scores, but now I'm wondering whether I actually had a reason to think this. I only just now realized this difference in how I am thinking about this (so sorry!!), even though obviously you had been writing "mapping score" all along! Do you know for sure which scores we are using here? That may influence interpretation, but not the analysis itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are pointing out a good point. From what I read from the Azimuth, :
both preduction and mapping scores exist and are the cell-level Metrics:

  • Prediction scores: Cell prediction scores range from 0 to 1 and reflect the confidence associated with each annotation. Cells with high-confidence annotations (for example, prediction scores > 0.75) reflect predictions that are supported by mulitple consistent anchors. Prediction scores can be visualized on the Feature Plots tab, or downloaded on the Download Results tab. The prediction depends on the specific annotation for each cell. Therefore, if you are mapping cells at multiple levels of resolution (for example level 1/2/3 annotations in the Human PBMC reference), each level will be associated with a different prediction score.

  • Mapping scores: This value from 0 to 1 reflects confidence that this cell is well represented by the reference. The “mapping.score” column is available to plot in the Feature Plots tab, and is provided in the download TSV file. The mapping score is independent of a specific annotation, is calculated using the MappingScore function in Seurat, and reflects how well the unique structure of a cell’s local neighborhood is preserved during reference mapping.

I am using the predicted.score, in fact I shouldn't refer it as mapping, you are right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware about the 2 metrics! thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fantastic, I think the predicted.score is definitely want we want to be using!! So, let's just change the text to say prediction instead of mapping, but otherwise this is good!

@sjspielman
Copy link
Member

@maud-p Is this one ready for me to have another look yet? No problem if not, just checking in :)

@maud-p
Copy link
Contributor Author

maud-p commented Oct 29, 2024

yes sorry, both of the PR should be ready 😄 I'll ask for review in a second!

@maud-p maud-p requested a review from sjspielman October 29, 2024 14:53
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's get this in!!

@maud-p
Copy link
Contributor Author

maud-p commented Oct 29, 2024

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

@sjspielman
Copy link
Member

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

I'm not sure what you mean here? Everything you currently have is fine! In #828 (which I'm reviewing now), I resolved the conflict with the main branch, so that PR can stay as it is.

@maud-p
Copy link
Contributor Author

maud-p commented Oct 29, 2024

should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828?

I'm not sure what you mean here? Everything you currently have is fine! In #828 (which I'm reviewing now), I resolved the conflict with the main branch, so that PR can stay as it is.

great thank you! then I just let it as it is 😄

@sjspielman sjspielman merged commit 6116e00 into AlexsLemonade:main Oct 29, 2024
3 checks passed
@maud-p maud-p deleted the 04_explore_mapping_score branch January 2, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants