Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use data with categories unseen by a trained model #74

Open
majpark21 opened this issue Oct 7, 2022 · 1 comment
Open

How to use data with categories unseen by a trained model #74

majpark21 opened this issue Oct 7, 2022 · 1 comment

Comments

@majpark21
Copy link

Hello,
I am facing a problem with loading and passing data through a trained model when the data contains categories that were unseen by the model at training time. Specifically, I am training on certain tissues and want to use the model's prediction on other tissues. The data for these unseen tissues are stored in a separate file from the training data.

The code to do this would look like:

# Training
column_tissue = 'celltype'
train_adata = scanpy.read('train_file.h5ad')
scgen.SCGEN.setup_anndata(train_adata, batch_key=None, labels_key=column_tissue)
model = scgen.SCGEN(train_adata, **model_kwargs)
model.train(...)
# Testing
test_adata = scanpy.read('test_file.h5ad')
model.get_decoded_expression(adata=test_adata, indices=...)

This outputs:

INFO Received view of anndata, making copy.
INFO Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup

And ends on:

ValueError: Category XXXX not found in source registry. Cannot transfer setup without extend_categories = True.

Where XXXX is a tissue that was absent from the training file.

What would be the correct way to do this? I cannot find any way to pass the extend_categories kwarg.

What I tried

After digging into the source code I imagine this would involve something like:
model.register_manager(model.adata_manager.transfer_fields(adata_target=test_adata, extend_categories=True))
But I cannot find how to make the model use this new manager.

For now, a workaround is to set the categories in the test data to a category that was present in the training data.For example, setting the tissue column in the test data to the first tissue in the registry of the model:

test_adata.obs = test_adata.obs.rename(columns={column_tissue: 'test_celltype'})
test_adata.obs[column_tissue] = model.adata_manager.registry['field_registries']['labels']['state_registry']['categorical_mapping'][0]

However this is quite an unsatisfactory solution and there is certainly a cleaner way of doing this.

Thank you!

@super-dainiu
Copy link

You might use _register_manager_for_instance() instead of register_manager()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants