-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826
cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826
Conversation
…outhwesternDSSR/jwl
…outhwesternDSSR/jwl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution, @UTSouthwesternDSSR! Since this is similar to the non-ETP ALL module, we can merge it.
Having reviewed the results, the B cell assignments from ScType are reasonably convincing for some libraries.
In terms of next steps, I would recommend picking one set of samples (non-ETP ALL or ETP ALL) to focus on and see if you can get the best quality B cell assignments possible (you might want to pull in the automatic assignments we already have from CellAssign and SingleR for comparison, too) and get inferCNV up and running.
Thank you again, and please let us know if there's anything you want to discuss!
Sure, I would start out with the non-ETP samples, and do what you have suggested (making sure that the B cells called are indeed solid, and then used them for running inferCNV) in the other pull request, and then test them with the ETP samples. Thank you so much for the suggestion! |
Purpose/implementation Section
To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003
Please link to the GitHub issue that this pull request addresses.
#822
What is the goal of this pull request?
To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003
Briefly describe the general approach you took to achieve this goal.
The same approach is followed as proposed in the module for non-ETP T-ALL (SCPCP000003):
The only difference is that there are more than one cluster identified as
B cell
in 4 samples (SCPCL000055
,SCPCL000066
,SCPCL000696
, andSCPCL000709
). I check the location of B cells on the umap and also compare with theBFeatures1
(average expression of B marker genes) on the dotplot. (I also check with the expression of adt_CD19 in these 4 samples. Higher expression is shown on the separated B island.)I believe that only those that are completely separated (an island on its own, rather than attached to the other clusters) can be confidently used as the normal cells for running CopyKat, although the results is not super promising (shown later).
If known, do you anticipate filing additional pull requests to complete this analysis module?
Results
What is the name of your results bucket on S3?
rds
objects:s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/rds
s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/
s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots
What types of results does your code produce (e.g., table, figure)?
rds
objects_metadata.txt
(cell ID, leiden clusters, cell type annotation, low confidence cell type annotation, CopyKat prediction, and new CopyKat prediction based on the "selected" B cells [for the 4 samples]) and_sctype_top10_celltypes_perCluster.txt
(top 10 possible cell types with their respective sctype score in each cluster)multipanels_
umap plots showing leiden clustering, cell type, and copyKat prediction respectively (for the 4 samples, I am showing the new CopyKat prediction based on "selected" B cells).dot
plots showing the average expression of group of markers for each cell type usingAddModuleScore()
What is your summary of the results?
With the default threshold of having sctype score > 25% of ncells in a cluster (
sctype_classification
), there are a large number of cells being annotated as "Unknown" in each sample, ranging from 0 to 61%, with the median ~25%.If we were to use 10% threshold (
lowConfidence_annot
), the percentage ofUnknown
is now capped at 28% (instead of 61%).Every sample has B cells annotated. Thus, I ran CopyKat on all of them, but as mentioned above, I selected some B cells as the normal for these 4 samples (
SCPCL000055
,SCPCL000066
,SCPCL000696
, andSCPCL000709
). Here are the comparison between using all B cells (copykat.pred
) vs particular B cells (new_copykat.pred
). The results seem to make sense forSCPCL000055
, since there are now much more aneuploid cells, and Late Eryth has become diploid. It works forSCPCL000066
too, but not really forSCPCL000709
, as the number ofaneuploid
cells decrease.SCPCL000696
shows very little changes.Overall, there are very few
not.defined
cells from CopyKat prediction results, capping at 6% of total cells in a sample.Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
renv.lock
andconda.lock.
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.