-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check document_store and embedding_model dimensions before calculating embeddings #5188
Comments
@AlexGWOmron Thank you for this suggestion. I agree that it makes a lot of sense to check document_store and embedding dimensions before running the embedding calculations. Would you maybe like to contribute this feature and open a PR? We can give early feedback if you make it a draft PR. Guidelines are here 🙂 |
@julian-risch I would like to work on this issue. What would be the preferred implementation that I should follow? |
@awinml That's great to hear! You can find our general contributor guidelines here: https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md |
@julian-risch Thank you for the detailed explanation. I understand the overall implementation suggestion and will open a draft PR shortly. I'll make sure to follow the contributor guidelines and include unit tests as well. |
Hi @julian-risch! I'd like to work on this if it's still open. |
Hi @AnushreeBannadabhavi, I am working on other issues at the moment, feel free to take it up. |
Done in #7357 for FAISS. Haystack 1.x is entering a Long Term Support phase: we will care to keep it working and solve bugs, Therefore, I would not change any other document repositories and close this issue. |
Is your feature request related to a problem? Please describe.
When running document_store.update_embeddings(retriever=embedding_retriever), embeddings will first be calculated then saved to the doc store.
However, if the embedding dimensions differ, you get an error and you have now calculated the embeddings wastefully. E.g.
RuntimeError: Embedding dimensions of the model (1024) don't match the embedding dimensions of the document store (768). Initiate FAISSDocumentStore again with arg embedding_dim=1024.
This is more of a problem when paying (e.g. open_ai).
Describe the solution you'd like
A check against both document_store and embedding dimensions before running the embedding calculations.
The text was updated successfully, but these errors were encountered: