Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retriever Naming #319

Closed
4 tasks done
bilgeyucel opened this issue Jan 31, 2024 · 5 comments
Closed
4 tasks done

Retriever Naming #319

bilgeyucel opened this issue Jan 31, 2024 · 5 comments
Assignees
Labels
feature request Ideas to improve an integration

Comments

@bilgeyucel
Copy link
Contributor

bilgeyucel commented Jan 31, 2024

The receivers' names should refer to the retrieval method instead of the input type. This allows us to have specific retrievers for document stores (e.g. Chroma has ChromaQueryRetriever that refers to Chroma's query API, not Haystack's query input).

Things to pay attention
Even if there's only one retrieval method that the document store supports, let's try to state the method in the retriever's name. e.g:

  • QdrantRetriever
  • QdrantEmbeddingRetriever

ℹ️ It's quite hard to standardize the retriever naming across all databases. Let's be quite flexible here and not be afraid to introduce new retriever types.

Some future retriever examples according to this convention

  • Qdrant (Qdrant Announcement about Splade)
    • QdrantSpladeRetriever -> "Splade" is not totally clear for Haystack users but aligns with Qdrant quite well
    • QdrantSparseRetriever -> clearer for Haystack users but sounds like a quite general term, might create confusion if there are multiple sparse vector methods that cannot be supported with the same Retriever
  • WeaviateHybridRetriever -> This component might take text as input (not embeddings) and return retrieved documents by using Weaviate's built-in hybrid search functionality

Tasks

Preview Give feedback
  1. feature request integration:qdrant
  2. 7 of 7
    anakin87
  3. 7 of 7
    anakin87
  4. 7 of 7
    breaking change
    anakin87
@bilgeyucel bilgeyucel added the feature request Ideas to improve an integration label Jan 31, 2024
@sahusiddharth
Copy link
Contributor

Hi @bilgeyucel, I'm working on this

  • Issue Rename the QdrantRetriever + /retrievers folder #134 I have changed the retriever name to the proposed QdrantEmbeddingRetriever and the file retriever.py is already in retriever folder, is there anything else that needs to be done?
  • AstraRetriever to AstraEmbeddingRetriever : done
  • About the pinecone, can you tell me the PineconeDenseRetriever -> PineconeEmbeddingRetriever decision or should I just do it for now?

@anakin87
Copy link
Member

anakin87 commented Feb 1, 2024

Some material for thinking about Sparse Embedding Retrieval (to choose the right names):

@bilgeyucel
Copy link
Contributor Author

Hi @sahusiddharth, we're still working on this issue so it's on hold at the moment. Feel free to pick another issue from "Contributions Wanted" list :)

@anakin87 anakin87 self-assigned this Feb 2, 2024
@anakin87
Copy link
Member

We decided for this naming convention:
Document Store + Technique + "Retriever"

Dense Embeddings Retriever → EmbeddingRetriever
Sparse Embedding Retriever → SparseEmbeddingRetriever

Examples:

  • ElasticsearchBM25Retriever
  • ElasticsearchEmbeddingRetriever
  • QdrantEmbeddingRetriever
  • QdrantSparseEmbeddingRetriever

@anakin87
Copy link
Member

Done!

Docs will be improved according to the new Naming: #6976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Ideas to improve an integration
Projects
None yet
Development

No branches or pull requests

3 participants