[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023

sztal · 2023-09-27T21:20:13Z

Problem as in the title. Code that is perfectly fine when spacy runs on CPU breaks when GPU acceleration is turned on.
This happens at least for the model en_coreference_web_trf-3.4.0a2.

Note
If there is any later release working out of the box without training that solves this problem, please let me know. My understanding from the docs of the coref component is that the one I use is the most recent trained component (and it indeed seems to work quite fine).

How to reproduce the behaviour

import spacy
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
doc.spans   # output: {'coref_clusters_1': [a party, It]}

So clearly the component does its job when running on CPU. But run this with spacy.prefer_gpu() and everything breaks:

import spacy
spacy.prefer_gpu()
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
# RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It seems that some tensors are stored on GPU and some still on CPU. Apparently this inconsistency may appear in several different parts of the code but for sure for the above reprex it is happening around line 269 of pytorch_coref_model.py where an attempt at performing an operation using tensors word_ids (stored on CPU) and top_indices (stored on GPU) is made.

Your Environment

spaCy version: 3.4.4
Platform: Linux-6.4.6-76060406-generic-x86_64-with-glibc2.35
Python version: 3.11.5
Pipelines: en_coreference_web_trf (3.4.0a2), en_core_web_lg (3.4.1), en_core_web_trf (3.4.1)

The text was updated successfully, but these errors were encountered:

shadeMe · 2023-09-28T12:56:07Z

Do you have the latest version of spacy-experimental installed? It appears that this particular bug was fixed in v0.6.2.

sztal · 2023-09-28T12:58:45Z

Thanks! But will v0.6.2 work with spacy>=3.4,3.5 and en_coreference_web_trf(0.3.4.0a2) (which seems to be the latest model that works out-of-the-box)?

Anyways, I will try updating to v0.6.2 soon and will let you know. Thanks once again for the fast reply!

sztal · 2023-09-28T18:11:44Z

Okay, it works! I updated successfully to spacy-experimental(0.6.3), while keeping spacy>=3.4,<3.5 and using en_coreference_web_trf(0.3.4.0a2) and everything works as expected.

Thanks a lot!

l4b4r4b4b4 · 2023-10-10T18:07:09Z

interesting. encountering the same error. will try spacy experimental as well :)

github-actions · 2023-11-10T00:02:18Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

sztal changed the title ~~[COREF] en_coreference_web_trf(3.4.0a2) breaks by storing input on GPU and parameters on CPU~~ [COREF] en_coreference_web_trf(3.4.0a2) breaks by storing some tensors on CPU and some on GPU Sep 27, 2023

shadeMe added the feat / coref Feature: Coreference resolution label Sep 28, 2023

shadeMe closed this as completed Sep 29, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023

[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023

sztal commented Sep 27, 2023

shadeMe commented Sep 28, 2023

sztal commented Sep 28, 2023

sztal commented Sep 28, 2023

l4b4r4b4b4 commented Oct 10, 2023

github-actions bot commented Nov 10, 2023

[COREF] en_coreference_web_trf(3.4.0a2) breaks by storing some tensors on CPU and some on GPU #13023

[COREF] en_coreference_web_trf(3.4.0a2) breaks by storing some tensors on CPU and some on GPU #13023

Comments

sztal commented Sep 27, 2023

How to reproduce the behaviour

Your Environment

shadeMe commented Sep 28, 2023

sztal commented Sep 28, 2023

sztal commented Sep 28, 2023

l4b4r4b4b4 commented Oct 10, 2023

github-actions bot commented Nov 10, 2023

[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023

[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023