Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when added as a pipe #14

Closed
debraj135 opened this issue Mar 30, 2021 · 4 comments
Closed

KeyError when added as a pipe #14

debraj135 opened this issue Mar 30, 2021 · 4 comments

Comments

@debraj135
Copy link

While the first and second option in the readme for using this works the third option

import spacy
# this is your nlp object that can be any spaCy model
nlp = spacy.load('en_core_web_sm')

# add the pipeline stage (will be mapped to the most adequate model from the table above, en_use_md)
nlp.add_pipe('universal_sentence_encoder')

throws a keyerror of the form

File "/.../spacy_universal_sentence_encoder/language.py", line 77, in use_model_factory
    config = util.configs[model_name]
KeyError: 'en_core_web_sm'

Is this behavior expected?

Thank you for any help.

MartinoMensio added a commit that referenced this issue Mar 30, 2021
this made the factory behave wrongly if debug was not active. Refers to #14
@MartinoMensio
Copy link
Owner

Hi @debraj135,
Thank you for opening the issue. It is not the expected behaviour. There was a little bug.
It should now work as expected. Please install the updated v0.4.1

Best,
Martino

@debraj135
Copy link
Author

debraj135 commented Mar 30, 2021

Thank you, this works now. I believe I have encountered another issue outlined in this snippet

>>> import spacy
>>> nlp = spacy.load('en_core_web_lg')
>>> nlp.add_pipe('universal_sentence_encoder')
<spacy_universal_sentence_encoder.language.UniversalSentenceEncoder object at 0x1f65fc080>
>>> doc = nlp('Hi there, how are you?')
>>> doc.vector.shape
(512,)
>>> doc[:5].vector.shape
(512,)
>>> doc[:1].vector.shape
(300,)
>>> doc[0].similarity(doc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "spacy/tokens/token.pyx", line 212, in spacy.tokens.token.Token.similarity
  File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (300,) and (512,) not aligned: 300 (dim 0) != 512 (dim 0)

Shouldn't the vectors for the tokens have the same dimensionality as span and doc?

@MartinoMensio
Copy link
Owner

Thanks for spotting this issue. It's a consequence of the change done for #13

The underlying model has the Tokens in the vocabulary and therefore the array of shape (300,) comes from en_core_web_lg. This is not the expected behaviour.
I will provide a fix, but if you switch back to v0.4.1 it should work as expected.

Thank you for your patience.
Best,
Martino

@debraj135
Copy link
Author

Thank you, appreciate your prompt response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants