-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow for evaluating clustering to Ewing's module #908
Add workflow for evaluating clustering to Ewing's module #908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick-end-of-friday review, but I plan to do a more careful look next Monday! I think it's probably all there now as is, but I don't 100% trust my brain at the moment to be sure of that 🥴
In the script I put in arguments for each louvain, leiden-cpm, and leiden-modularity to specify running each of them, but I could also be convinced to remove those arguments and just run those by default?
I definitely understand why you did this, but I think since this script is very specific to this module and this circumstance, it would be fine to rip them out. But it doesn't hurt to keep them, I think. I did think a little bit about whether there might be a different way to specify parameters and algorithms, and I couldn't come up with a better approach given how all the different parameters interact, and again since this is quite specific to your module I think it's fine.
analyses/cell-type-ewings/scripts/clustering-workflow/01-clustering.R
Outdated
Show resolved
Hide resolved
analyses/cell-type-ewings/scripts/clustering-workflow/01-clustering.R
Outdated
Show resolved
Hide resolved
Co-authored-by: Stephanie Spielman <[email protected]>
I too was struggling with how best to specify the parameters! I decided to keep in the algorithm flags for now and figured it would be easier to remove in the future if we need to, but also will allow for flexibility if we ever want to use this script to just run one algorithm. I made the minor changes you found and then updated to use the helper function for reading in the list of parameters. This should be ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I made one suggestion to fix a bug I probably introduced with my earlier suggestion, and I have one other comment that I can't make directly here - Looking at the resulting plots in an HTML, I realize we probably want some explicit breaks in the x-axis across nn, so there's a tick/value for each parameter value. Right now ggplot2 is choosing breaks for you since it's treating nn as numeric. So, we should either add some breaks for the actual nn value, or tell R to treat it like a factor instead of numeric. But I don't think I need to see again!
param |> | ||
stringr::str_split_1(param, ",") |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably my bad...
param |> | |
stringr::str_split_1(param, ",") |> | |
param |> | |
stringr::str_split_1(",") |> |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #897
Closes #686
What is the goal of this pull request?
This PR takes the notebook that was modified in #895 and moves it into a workflow that can be run to perform and evaluate clustering across all samples in the Ewing's project (SCPCP000015). The workflow includes two steps:
Briefly describe the general approach you took to achieve this goal.
01-clustering-metrics.Rmd
now reads in this TSV file and creates the plots looking at individual metrics.evaluate-clusters.sh
workflow that has been added to CI and I tested locally with the test data. This workflow runs the clustering script and renders the clustering report for all samples in the project.If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes. Next I am going to work on the second notebook in the workflow (see #896). This notebook will take the clustering results TSV and a set of parameters to look at and compare clusters to cell types and marker gene set scores. I'm still thinking about exactly how this notebook will look and where it will fit into the workflow, but the idea is to be able to look at the output from the metrics report rendered and choose a set of parameters to narrow in on and look at those clusters in a biological context.
Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
Should be able to run this locally without any issues.
Are there particularly areas you'd like reviewers to have a close look at?
Just a note that the actual workflow itself has the exact same logic as the
aucell-singler-annotation.sh
and a lot of the same code. The biggest code changes that will need to be reviewed is the clustering script since that is brand new.Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.