You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This component's aim is to create embeddings starting from raw data. In practice, it will take Documents with no embeddings and return an equally long list of Documents with embeddings.
We may have different embedders depending on the embedding strategy/provider. For example HuggingFaceEmbedder, OpenAIEmbedder, ... In this case, more than one PR will be linked to this issue.
The first API draft above is strictly oriented towards indexing, as it takes a list of Documents as input. In that form, it would not be compatible with a query pipeline, which needs to process simple strings and send simple embeddings to MemoryEmbeddingRetriever.
There are several strategies we can go for:
Make Embedders work with raw data, not Documents (API draft 2)
PRO: they are able to ingest anything, depending on the model given, which makes them extremely flexible
CON: at indexing time we need to match the embeddings with Documents in a separate component
This component's aim is to create embeddings starting from raw data. In practice, it will take Documents with no embeddings and return an equally long list of Documents with embeddings.
We may have different embedders depending on the embedding strategy/provider. For example
HuggingFaceEmbedder
,OpenAIEmbedder
, ... In this case, more than one PR will be linked to this issue.Minimal API draft:
or alternatively (see Open Questions):
Open questions
Embedders need to be used both for indexing pipelines, to add embeddings to a document, and for query pipelines, in front of an EmbeddingRetriever.
The first API draft above is strictly oriented towards indexing, as it takes a list of Documents as input. In that form, it would not be compatible with a query pipeline, which needs to process simple strings and send simple embeddings to MemoryEmbeddingRetriever.
There are several strategies we can go for:
Create another primitive, like
Data
, and makeDocument
inherit from it. Then Embedders can deal withData
objectsDataToDocument
Make MemoryEmbeddingRetriever accept a
Document
as inputTasks
The text was updated successfully, but these errors were encountered: