Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COREF] en_coreference_web_trf(3.4.0a2) breaks by storing some tensors on CPU and some on GPU #13023

Closed
sztal opened this issue Sep 27, 2023 · 5 comments
Labels
feat / coref Feature: Coreference resolution

Comments

@sztal
Copy link

sztal commented Sep 27, 2023

Problem as in the title. Code that is perfectly fine when spacy runs on CPU breaks when GPU acceleration is turned on.
This happens at least for the model en_coreference_web_trf-3.4.0a2.

Note
If there is any later release working out of the box without training that solves this problem, please let me know. My understanding from the docs of the coref component is that the one I use is the most recent trained component (and it indeed seems to work quite fine).

How to reproduce the behaviour

import spacy
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
doc.spans   # output: {'coref_clusters_1': [a party, It]}

So clearly the component does its job when running on CPU. But run this with spacy.prefer_gpu() and everything breaks:

import spacy
spacy.prefer_gpu()
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
# RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It seems that some tensors are stored on GPU and some still on CPU. Apparently this inconsistency may appear in several different parts of the code but for sure for the above reprex it is happening around line 269 of pytorch_coref_model.py where an attempt at performing an operation using tensors word_ids (stored on CPU) and top_indices (stored on GPU) is made.

Your Environment

  • spaCy version: 3.4.4
  • Platform: Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35
  • Python version: 3.11.5
  • Pipelines: en_coreference_web_trf (3.4.0a2), en_core_web_lg (3.4.1), en_core_web_trf (3.4.1)
@sztal sztal changed the title [COREF] en_coreference_web_trf(3.4.0a2) breaks by storing input on GPU and parameters on CPU [COREF] en_coreference_web_trf(3.4.0a2) breaks by storing some tensors on CPU and some on GPU Sep 27, 2023
@shadeMe
Copy link
Contributor

shadeMe commented Sep 28, 2023

Do you have the latest version of spacy-experimental installed? It appears that this particular bug was fixed in v0.6.2.

@shadeMe shadeMe added the feat / coref Feature: Coreference resolution label Sep 28, 2023
@sztal
Copy link
Author

sztal commented Sep 28, 2023

Thanks! But will v0.6.2 work with spacy>=3.4,3.5 and en_coreference_web_trf(0.3.4.0a2) (which seems to be the latest model that works out-of-the-box)?

Anyways, I will try updating to v0.6.2 soon and will let you know. Thanks once again for the fast reply!

@sztal
Copy link
Author

sztal commented Sep 28, 2023

Okay, it works! I updated successfully to spacy-experimental(0.6.3), while keeping spacy>=3.4,<3.5 and using en_coreference_web_trf(0.3.4.0a2) and everything works as expected.

Thanks a lot!

@shadeMe shadeMe closed this as completed Sep 29, 2023
@l4b4r4b4b4
Copy link

interesting. encountering the same error. will try spacy experimental as well :)

Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / coref Feature: Coreference resolution
Projects
None yet
Development

No branches or pull requests

3 participants