You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a theoretical question regarding the evaluation of integration methods.
Typically, cell type assignment occurs after integration and clustering.
However, I'm uncertain about which cell types labels to use for benchmarking.
To assign cell types, I am based on a specific clustering following a certain integration (or using non-integrated embeddings).
However, I'm concerned that benchmarking results might be biased toward the integration method used to obtain the clusters for which I assigned the cell types.
Could you please provide some guidance on how to approach this?
Thank you :)
The text was updated successfully, but these errors were encountered:
Your concern is one we also share. Thus, when we benchmark integration methods for a particular task, then we typically try to get a coarse labeling of the datasets to be integrated separately. Then we integrate and evaluate based on this per-dataset coarse labelling. If there are particular cell types you want to make sure should be preserved, you can also specifically annotate these in the individual datasets. This is something I have been calling generating seed annotations. So far coarse labels have worked quite well at selecting a method. This is something we did for the human lung cell atlas for example.
Thank you for your answer!
Just to check that I have understood it correctly, would it mean to use a 'cluster-independent' annotation method for example Seed labeling with scANVI, in order to get an unbiased 'ground truth' to evaluate the bio conservation? is that right?
Hello,
I have a theoretical question regarding the evaluation of integration methods.
Typically, cell type assignment occurs after integration and clustering.
However, I'm uncertain about which cell types labels to use for benchmarking.
To assign cell types, I am based on a specific clustering following a certain integration (or using non-integrated embeddings).
However, I'm concerned that benchmarking results might be biased toward the integration method used to obtain the clusters for which I assigned the cell types.
Could you please provide some guidance on how to approach this?
Thank you :)
The text was updated successfully, but these errors were encountered: