Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Benchmarking Cell Type Assignments Post-Integration and Concerns about Bias in Results #393

Open
MariaRosariaNucera opened this issue Dec 28, 2023 · 2 comments

Comments

@MariaRosariaNucera
Copy link

Hello,

I have a theoretical question regarding the evaluation of integration methods.

Typically, cell type assignment occurs after integration and clustering.

However, I'm uncertain about which cell types labels to use for benchmarking.

To assign cell types, I am based on a specific clustering following a certain integration (or using non-integrated embeddings).

However, I'm concerned that benchmarking results might be biased toward the integration method used to obtain the clusters for which I assigned the cell types.

Could you please provide some guidance on how to approach this?

Thank you :)

@LuckyMD
Copy link
Collaborator

LuckyMD commented Jan 17, 2024

Hi @MariaRosariaNucera,

Your concern is one we also share. Thus, when we benchmark integration methods for a particular task, then we typically try to get a coarse labeling of the datasets to be integrated separately. Then we integrate and evaluate based on this per-dataset coarse labelling. If there are particular cell types you want to make sure should be preserved, you can also specifically annotate these in the individual datasets. This is something I have been calling generating seed annotations. So far coarse labels have worked quite well at selecting a method. This is something we did for the human lung cell atlas for example.

Hope this helps!

@MariaRosariaNucera
Copy link
Author

Thank you for your answer!
Just to check that I have understood it correctly, would it mean to use a 'cluster-independent' annotation method for example Seed labeling with scANVI, in order to get an unbiased 'ground truth' to evaluate the bio conservation? is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants