Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
community[patch]: fix semantic answer bug in AzureSearch vector store (…
…#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** #18731 - **Prior fix to this bug:** This bug was fixed in this PR #15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR #15659 (see ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [#1564](#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
- Loading branch information