KeyError when added as a pipe #14

debraj135 · 2021-03-30T00:03:35Z

While the first and second option in the readme for using this works the third option

import spacy
# this is your nlp object that can be any spaCy model
nlp = spacy.load('en_core_web_sm')

# add the pipeline stage (will be mapped to the most adequate model from the table above, en_use_md)
nlp.add_pipe('universal_sentence_encoder')

throws a keyerror of the form

File "/.../spacy_universal_sentence_encoder/language.py", line 77, in use_model_factory
    config = util.configs[model_name]
KeyError: 'en_core_web_sm'

Is this behavior expected?

Thank you for any help.

The text was updated successfully, but these errors were encountered:

this made the factory behave wrongly if debug was not active. Refers to #14

MartinoMensio · 2021-03-30T14:00:26Z

Hi @debraj135,
Thank you for opening the issue. It is not the expected behaviour. There was a little bug.
It should now work as expected. Please install the updated v0.4.1

Best,
Martino

debraj135 · 2021-03-30T18:07:37Z

Thank you, this works now. I believe I have encountered another issue outlined in this snippet

>>> import spacy
>>> nlp = spacy.load('en_core_web_lg')
>>> nlp.add_pipe('universal_sentence_encoder')
<spacy_universal_sentence_encoder.language.UniversalSentenceEncoder object at 0x1f65fc080>
>>> doc = nlp('Hi there, how are you?')
>>> doc.vector.shape
(512,)
>>> doc[:5].vector.shape
(512,)
>>> doc[:1].vector.shape
(300,)
>>> doc[0].similarity(doc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "spacy/tokens/token.pyx", line 212, in spacy.tokens.token.Token.similarity
  File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (300,) and (512,) not aligned: 300 (dim 0) != 512 (dim 0)

Shouldn't the vectors for the tokens have the same dimensionality as span and doc?

MartinoMensio · 2021-03-30T18:34:10Z

Thanks for spotting this issue. It's a consequence of the change done for #13

The underlying model has the Tokens in the vocabulary and therefore the array of shape (300,) comes from en_core_web_lg. This is not the expected behaviour.
I will provide a fix, but if you switch back to v0.4.1 it should work as expected.

Thank you for your patience.
Best,
Martino

debraj135 · 2021-03-30T18:46:22Z

Thank you, appreciate your prompt response!

MartinoMensio added a commit that referenced this issue Mar 30, 2021

small indent error

039bc54

this made the factory behave wrongly if debug was not active. Refers to #14

MartinoMensio mentioned this issue Mar 30, 2021

cant change vocab vector #13

Open

MartinoMensio closed this as completed Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when added as a pipe #14

KeyError when added as a pipe #14

debraj135 commented Mar 30, 2021

MartinoMensio commented Mar 30, 2021

debraj135 commented Mar 30, 2021 •

edited

Loading

MartinoMensio commented Mar 30, 2021

debraj135 commented Mar 30, 2021

KeyError when added as a pipe #14

KeyError when added as a pipe #14

Comments

debraj135 commented Mar 30, 2021

MartinoMensio commented Mar 30, 2021

debraj135 commented Mar 30, 2021 • edited Loading

MartinoMensio commented Mar 30, 2021

debraj135 commented Mar 30, 2021

debraj135 commented Mar 30, 2021 •

edited

Loading