-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: fix semantic answer bug in AzureSearch vector store #18938
Merged
baskaryan
merged 1 commit into
langchain-ai:master
from
Skar0:fix-azuresearch-semantic-answer
Mar 26, 2024
Merged
community: fix semantic answer bug in AzureSearch vector store #18938
baskaryan
merged 1 commit into
langchain-ai:master
from
Skar0:fix-azuresearch-semantic-answer
Mar 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ield named 'key' from metadata in semantic_hybrid_search_with_score
dosubot
bot
added
size:XS
This PR changes 0-9 lines, ignoring generated files.
Ɑ: vector store
Related to vector store module
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
labels
Mar 11, 2024
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
5 tasks
not sure when I will have time to test it, but just to be sure, which version was this PR released? |
Do you mean which |
@efriis should I perform any other actions to validate the changes in this PR? I'm not sure how to find people to review such a detail (although it's a real bug). |
gkorland
pushed a commit
to FalkorDB/langchain
that referenced
this pull request
Mar 30, 2024
…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
chrispy-snps
pushed a commit
to chrispy-snps/langchain
that referenced
this pull request
Mar 30, 2024
…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
hinthornw
pushed a commit
that referenced
this pull request
Apr 26, 2024
…#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** #18731 - **Prior fix to this bug:** This bug was fixed in this PR #15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR #15659 (see ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [#1564](#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
size:XS
This PR changes 0-9 lines, ignoring generated files.
Ɑ: vector store
Related to vector store module
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description: The
semantic_hybrid_search_with_score_and_rerank
method ofAzureSearch
contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets fromAzureSearch
test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use theFIELDS_METADATA
variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search insemantic_hybrid_search_with_score_and_rerank
.Issue: Azure AI Search, metadata field is required and hardcoded in langchain community #18731
Prior fix to this bug: This bug was fixed in this PR community: Replaced hardcoded "metadata" with FIELDS_METADATA variable in semantic_hybrid_search_with_score_and_rerank #15642 by adding a check for the existence of the metadata field named
FIELDS_METADATA
and retrieving a value for the key called "key" in that metadata if it exists. If the field namedFIELDS_METADATA
was not present, an empty string was returned. This fix was removed in this PR community: update AzureSearch class to work with azure-search-documents=11.4.0 #15659 (see ed1ffca).@lz-chen: could you confirm this wasn't intentional?
New fix to this bug: I believe there was an oversight in the logic of the fix from #1564 which I explain below.
The
semantic_hybrid_search_with_score_and_rerank
method creates a dictionarysemantic_answers_dict
with semantic answers returned by the search as follows.langchain/libs/community/langchain_community/vectorstores/azuresearch.py
Lines 574 to 581 in 5c2f7e6
The keys in this dictionary are the unique document ids in the index, if I understand the documentation of semantic answers in Azure AI Search correctly. When the method transforms a search result into a
Document
object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between aDocument
object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for thesemantic_answers_dict
dictionary. This id is defined in the search result's field namedFIELDS_ID
. I added a check to avoid any error in case no field namedFIELDS_ID
exists in a search result (which shouldn't happen in theory).A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field.
@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?
Thanks for the help!