-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The filters are not working with metadata that contain a space. #616
Comments
Hello @PAHXO thank you for reporting this issue. After having a first look, my understanding is that the issue is caused by using Lines 62 to 63 in 9b922d6
I'll continue looking into it. from haystack import Document
from haystack.components.retrievers import FilterRetriever
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.elasticsearch import ElasticsearchDocumentStore
docs = [
Document(content="Python is a popular programming language",
meta={"about": "Python", "language": "english"}),
Document(content="python ist eine beliebte Programmiersprache",
meta={"about": "Python", "language": "german"}),
]
document_store = ElasticsearchDocumentStore(hosts="http://localhost:9200")
document_store.write_documents(docs, policy=DuplicatePolicy.OVERWRITE)
retriever = FilterRetriever(document_store)
result = retriever.run(filters={"field": "about", "operator": "==", "value": "Python"})
print(result["documents"]) # no document retrieved
result = retriever.run(filters={"field": "about", "operator": "==", "value": "python"})
print(result["documents"]) # both documents retrieved For reference, here is the relevant documentation page from elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#term-query-notes |
One way to fix the behavior would be to adjust the mapping that you use for your elasticsearch index so that the metadata in the about field is not analyzed. |
@PAHXO The bug is fixed now and there is a new release of elasticsearch-haystack on pypi: https://pypi.org/project/elasticsearch-haystack/ |
I'll sure try it as soon as I can! Thanks, for taking the time to look into the issue @julian-risch 🫡 |
Greetings.
Elasticsearch retrievers bm25, Embedding, and the filter retriever. Their filters don't select string metadata that a has space within them.
The text was updated successfully, but these errors were encountered: