Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added top_k argument in the run function of ElasticSearcBM25Retriever #130

Conversation

sahusiddharth
Copy link
Contributor

The top_k can only be defined at initialization, It would allow users to change the top_k at the pipeline runtime too.

@sahusiddharth sahusiddharth requested a review from a team as a code owner December 20, 2023 10:18
@sahusiddharth sahusiddharth requested review from masci and removed request for a team December 20, 2023 10:18
@CLAassistant
Copy link

CLAassistant commented Dec 20, 2023

CLA assistant check
All committers have signed the CLA.

masci
masci previously requested changes Dec 21, 2023
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left a couple of comments but it looks good. Would you mind singing the CLA before we merge this? Thanks in advance!

@@ -48,12 +48,12 @@ def from_dict(cls, data: Dict[str, Any]) -> "ElasticsearchBM25Retriever":
return default_from_dict(cls, data)

@component.output_types(documents=List[Document])
def run(self, query: str):
def run(self, query: str, top_k: int=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top_k should be typed as Optional

docs = self._document_store._bm25_retrieval(
query=query,
filters=self._filters,
fuzziness=self._fuzziness,
top_k=self._top_k,
top_k=self._top_k if top_k == None else top_k,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify this statement with top_k = top_k or self.top_k

@@ -64,17 +64,18 @@ def from_dict(cls, data: Dict[str, Any]) -> "ElasticsearchEmbeddingRetriever":
return default_from_dict(cls, data)

@component.output_types(documents=List[Document])
def run(self, query_embedding: List[float]):
def run(self, query_embedding: List[float], top_k:int = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, top_k is optional

"""
Retrieve documents using a vector similarity metric.

:param query_embedding: Embedding of the query.
:param top_k: Maximum number of Documents to return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the docs! It's out of the scope of this PR, but would you mind adding a similar docstring to the run method of the other retriever component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masci

The changes you suggested will be done soon.

Question: When you mentioned other retriever components you mean in the main haystack right?

Copy link
Contributor

@masci masci Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masci masci self-assigned this Dec 21, 2023
@masci masci changed the title added top_k arguement in the run function of ElasticSearcBM25Retriever added top_k argument in the run function of ElasticSearcBM25Retriever Dec 22, 2023
@sahusiddharth sahusiddharth requested a review from masci December 22, 2023 17:30
@sahusiddharth
Copy link
Contributor Author

@masci All done

@anakin87 anakin87 self-requested a review December 27, 2023 15:43
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some little changes.
Thanks for this PR!

@anakin87 anakin87 dismissed masci’s stale review December 27, 2023 15:46

changes addressed

@anakin87 anakin87 merged commit 34ada7a into deepset-ai:main Dec 27, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants