-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
04- Explore the mapping scores for Wilms tumor -06 #835
04- Explore the mapping scores for Wilms tumor -06 #835
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, these look like good additions to explore scores a bit more! The main thing I think is missing is a clearer understanding of how you ended up choosing 0.85. Right now, it appears that this was just a visual assessment? On one hand, I don't want to suggest too much more work for you here on this PR since we're coming up on deadlines, but I do think more evidence supporting this specific choice would be good and how it was chosen. For example, why not 0.5? Why not 0.95? Some justification is helpful here.
Perhaps this is a good middle ground as something that can at least explore whether this threshold is reasonable:
- You can make knit a few versions of this notebook, specifying a few different thresholds. You can knit them all to create HTMLs with a custom name that includes the threshold in the file name. I recommend at least these thresholds, maybe 0.5, 0.85, and 0.95? Then, we can compare a bit more clearly and choose the ideal threshold. This will allow you to make use of the
param
, as well. - As part of this, you'll want to update text in the notebook to indicate that you are also exploring the potential effects of score thresholds, rather than saying "we chose this threshold and will continue." This will probably involve updating the intro and conclusion text, primarily. In the conclusion, you don't need to conclude which threshold to use, since each notebook will be using a different threshold. You can formally document which threshold is used in the scripts used in next steps.
- Also, you probably want to remove the text at the bottom of the notebook saying how many of each type of cell were found in the 5 samples chosen, since again this will be different for thresholds across notebooks. Instead, the
notebook/README.md
already explains which samples were chosen, so just keep that text. Since these numbers are also likely to shift somewhat (but they should not change too much!) with the forthcoming code changes we are working on to do annotation without Azimuth functions and because of small changes that may occur with data releases, I recommend just writing down the sample IDs without the specific cell counts. Instead, you can just say that these were chosen because they are majority kidney with a good amount of immune + endothelial. - You should also update
notebook/README.md
to briefly explain that part of this notebook is to explore thresholds.
- Also, you probably want to remove the text at the bottom of the notebook saying how many of each type of cell were found in the 5 samples chosen, since again this will be different for thresholds across notebooks. Instead, the
- I also recommend adding plots (this should be super quick!): Let's make the marker gene plots twice: using all annotations (which is what you currently do), and then make a second version of these plots with only cells passing the threshold. You can create a second data frame for this, and then just plot those results using your
do_Feature_mean
function. We'd hope to see stronger signal for marker genes after filtering, and with a couple rendered notebooks,
# code for second data frame to plot
cell_type_df_pass <- cell_type_df |>
dplyr::filter(pass_mapping_QC)
- It might be worth also visually exploring with UMAP where cells are colored by compartment, and you'd make 2 versions for each dataset: with all cells, and only with cells that pass the score threshold. But, I would not make these plots unless you think the marker gene plots do not provide sufficient evidence to pick a threshold among the ones you explore. In case you do decide to do this, you would want to pull out the UMAP coordinates in the code that makes
cell_type_df
and plot usingggplot() + geom_point()
.
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment here: Can you add text above plots stating that the line is drawn at the threshold being explored in the notebook?
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
Hi @sjspielman , thank you so much for staying active on the revisions while being on a workshop!
Thank you! |
regarding the choice of threshold, I think 0.5 is too low, almost all cells having a higher What do you think from 0.75 and 0.85, I cannot really decide 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maud-p, sorry I wasn't able to review more last week! I have a bit of feedback for this PR, but we can get this in shortly!
While looking at the heatmaps, I realized something was strange with the legends which are appearing as discrete when they should be continuous - turns out there's a bug in line 164. This may also be an issue in other notebooks that use this plotting strategy too, but definitely do not worry about that!!! Let's just fix it here:
# current line 164
guides(fill=guide_legend(title=paste0(feature)))
# but it should be using guide_colourbar
guides(fill=guide_colourbar(title=paste0(feature)))
It would also be good to make the titles in these heatmap plots a little smaller since it runs over the page currently. Can you update the theme lines here to include title = element_text(size = rel(0.75))
(FYI rel(0.75)
means "0.75 times (aka, relative to) the default size")? This should help the titles fit. You may need to change the 0.75
number a bit, but I think it should be close.
I think either threshold 0.75 or 0.85 will be fine here; it's just important to note which you choose and why! It's also fine to say that both looked good, so you just choose the more (or less) stringent one. Since you've already run next steps of code with 0.85, that should be fine to keep. Please just add a quick sentence or two to the README to state which one you are choosing. It would be helpful to also include the concluding notes you made in this comment in the README, too #835 (comment).
Hi @sjspielman , good catch thank you! I was wondering why the legends were so, but didn't find the error! Thank you! |
@@ -36,6 +36,19 @@ The next step in analysis is to identify tumor vs. normal cells. | |||
- `04_annotation_Across_Samples_exploration.html` is the output of the [`04_annotation_Across_Samples_exploration.Rmd`](../notebook/04_annotation_Across_Samples_exploration.Rmd) notebook. | |||
In brief, we explored the label transfer results across all samples in the Wilms tumor dataset SCPCP000006 in order to identify a few samples that we can begin next analysis steps with. | |||
|
|||
One way to evaluate the label transfer is to look at the mapping score for each label being transfered, which more or less correspond to the certainty for a label transfer to be _TRUE_. | |||
We render the notebook with different thresholds for the mapping score and evaluate the impact of filtering out cells with a mapping score below 0.5, 0.75, 0.85 and 0.95. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to point something out here from the Azimuth docs: https://azimuth.hubmapconsortium.org/
I had been under the impression that the scores we were working with are what they are calling prediction scores, not mapping scores, but now I'm wondering whether I actually had a reason to think this. I only just now realized this difference in how I am thinking about this (so sorry!!), even though obviously you had been writing "mapping score" all along! Do you know for sure which scores we are using here? That may influence interpretation, but not the analysis itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are pointing out a good point. From what I read from the Azimuth, :
both preduction and mapping scores exist and are the cell-level Metrics:
-
Prediction scores: Cell prediction scores range from 0 to 1 and reflect the confidence associated with each annotation. Cells with high-confidence annotations (for example, prediction scores > 0.75) reflect predictions that are supported by mulitple consistent anchors. Prediction scores can be visualized on the Feature Plots tab, or downloaded on the Download Results tab. The prediction depends on the specific annotation for each cell. Therefore, if you are mapping cells at multiple levels of resolution (for example level 1/2/3 annotations in the Human PBMC reference), each level will be associated with a different prediction score.
-
Mapping scores: This value from 0 to 1 reflects confidence that this cell is well represented by the reference. The “mapping.score” column is available to plot in the Feature Plots tab, and is provided in the download TSV file. The mapping score is independent of a specific annotation, is calculated using the MappingScore function in Seurat, and reflects how well the unique structure of a cell’s local neighborhood is preserved during reference mapping.
I am using the predicted.score
, in fact I shouldn't refer it as mapping
, you are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware about the 2 metrics! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fantastic, I think the predicted.score is definitely want we want to be using!! So, let's just change the text to say prediction instead of mapping, but otherwise this is good!
…s_Samples_exploration.html
…s_Samples_exploration_mappingscore_threshold_0.5.html
…s_Samples_exploration_mappingscore_threshold_0.75.html
…s_Samples_exploration_mappingscore_threshold_0.85.html
…s_Samples_exploration_mappingscore_threshold_0.95.html
@maud-p Is this one ready for me to have another look yet? No problem if not, just checking in :) |
yes sorry, both of the PR should be ready 😄 I'll ask for review in a second! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, let's get this in!!
should I re-open a PR based on the new main branch and add the last updates that we did on the PR#828? |
I'm not sure what you mean here? Everything you currently have is fine! In #828 (which I'm reviewing now), I resolved the conflict with the main branch, so that PR can stay as it is. |
great thank you! then I just let it as it is 😄 |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
This PR is linked to one comment in the PR#828 regarding the correct labelling of endothelial cells:
What is the goal of this pull request?
Here, I wanted to explore the mapping score of label transfer of the predicted compartment from the fetal kidney reference.
I wanted to check how realable are the
endothelial
andimmune
annotations that we used as normal reference ininfercnv
.It seems that the majority of
endothelial
andimmune
cells map the reference with a highmapping.score
> 0.85.I might include a filtering step in the
infercnv.R
script to filter outimmune
and orendothelial
cells we used for the reference if they have a badmapping.score
.Briefly describe the general approach you took to achieve this goal.
I just added few
density
abdboxplots
to check the distribution of themapping.score
for thecompartments
(fetal nephron, strona, endothelial and immune) from the label transfer from the fetal kidney reference.If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes, I will include a filtering step in the
infercnv.R
script to filter outimmune
and orendothelial
cells we used for the reference if they have a badmapping.score
.What types of results does your code produce (e.g., table, figure)?
One notebook.
What is your summary of the results?
It might be worth filtering out cells with a
mapping.score
< 0.85 while running06_infercnv.R
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.