-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added top_k argument in the run function of ElasticSearcBM25Retriever #130
added top_k argument in the run function of ElasticSearcBM25Retriever #130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left a couple of comments but it looks good. Would you mind singing the CLA before we merge this? Thanks in advance!
@@ -48,12 +48,12 @@ def from_dict(cls, data: Dict[str, Any]) -> "ElasticsearchBM25Retriever": | |||
return default_from_dict(cls, data) | |||
|
|||
@component.output_types(documents=List[Document]) | |||
def run(self, query: str): | |||
def run(self, query: str, top_k: int=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top_k
should be typed as Optional
docs = self._document_store._bm25_retrieval( | ||
query=query, | ||
filters=self._filters, | ||
fuzziness=self._fuzziness, | ||
top_k=self._top_k, | ||
top_k=self._top_k if top_k == None else top_k, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simplify this statement with top_k = top_k or self.top_k
@@ -64,17 +64,18 @@ def from_dict(cls, data: Dict[str, Any]) -> "ElasticsearchEmbeddingRetriever": | |||
return default_from_dict(cls, data) | |||
|
|||
@component.output_types(documents=List[Document]) | |||
def run(self, query_embedding: List[float]): | |||
def run(self, query_embedding: List[float], top_k:int = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, top_k
is optional
""" | ||
Retrieve documents using a vector similarity metric. | ||
|
||
:param query_embedding: Embedding of the query. | ||
:param top_k: Maximum number of Documents to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the docs! It's out of the scope of this PR, but would you mind adding a similar docstring to the run
method of the other retriever component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes you suggested will be done soon.
Question: When you mentioned other retriever components you mean in the main haystack right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sahusiddharth no I mean the bm25 retriever in this integration, see https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/elasticsearch/src/elasticsearch_haystack/bm25_retriever.py#L51
…-ElasticSearshBM25Retriever
@masci All done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some little changes.
Thanks for this PR!
The top_k can only be defined at initialization, It would allow users to change the top_k at the pipeline runtime too.