community: fix semantic answer bug in AzureSearch vector store #18938

Skar0 · 2024-03-11T21:40:17Z

Description: The semantic_hybrid_search_with_score_and_rerank method of AzureSearch contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from AzureSearch test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the FIELDS_METADATA variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in semantic_hybrid_search_with_score_and_rerank.
Issue: Azure AI Search, metadata field is required and hardcoded in langchain community #18731
Prior fix to this bug: This bug was fixed in this PR community: Replaced hardcoded "metadata" with FIELDS_METADATA variable in semantic_hybrid_search_with_score_and_rerank #15642 by adding a check for the existence of the metadata field named FIELDS_METADATA and retrieving a value for the key called "key" in that metadata if it exists. If the field named FIELDS_METADATA was not present, an empty string was returned. This fix was removed in this PR community: update AzureSearch class to work with azure-search-documents=11.4.0 #15659 (see ed1ffca).
@lz-chen: could you confirm this wasn't intentional?

New fix to this bug: I believe there was an oversight in the logic of the fix from #1564 which I explain below.
The semantic_hybrid_search_with_score_and_rerank method creates a dictionary semantic_answers_dict with semantic answers returned by the search as follows.

langchain/libs/community/langchain_community/vectorstores/azuresearch.py

Lines 574 to 581 in 5c2f7e6

    
           # Get Semantic Answers 
        
           semantic_answers = results.get_answers() or [] 
        
           semantic_answers_dict: Dict = {} 
        
           for semantic_answer in semantic_answers: 
        
               semantic_answers_dict[semantic_answer.key] = { 
        
                   "text": semantic_answer.text, 
        
                   "highlights": semantic_answer.highlights, 
        
               }

The keys in this dictionary are the unique document ids in the index, if I understand the documentation of semantic answers in Azure AI Search correctly. When the method transforms a search result into a Document object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a Document object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the semantic_answers_dict dictionary. This id is defined in the search result's field named FIELDS_ID. I added a check to avoid any error in case no field named FIELDS_ID exists in a search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field.

@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?

Thanks for the help!

…ield named 'key' from metadata in semantic_hybrid_search_with_score

vercel · 2024-03-11T21:41:24Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)			Mar 11, 2024 9:41pm

libs/community/langchain_community/vectorstores/azuresearch.py

levalencia · 2024-03-14T11:53:56Z

not sure when I will have time to test it, but just to be sure, which version was this PR released?

Skar0 · 2024-03-18T22:01:07Z

Do you mean which azure-search-documents version?

Skar0 · 2024-03-18T22:05:58Z

@efriis should I perform any other actions to validate the changes in this PR? I'm not sure how to find people to review such a detail (although it's a real bug).

@lz-chen

…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!

@lz-chen

…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!

@lz-chen

…#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** #18731 - **Prior fix to this bug:** This bug was fixed in this PR #15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR #15659 (see ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [#1564](#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!

Use the result unique id as defined in FIELDS_ID instead of using a f…

2d9fc39

…ield named 'key' from metadata in semantic_hybrid_search_with_score

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Mar 11, 2024

Skar0 mentioned this pull request Mar 11, 2024

Azure AI Search, metadata field is required and hardcoded in langchain community #18731

Closed

5 tasks

thelazydogsback reviewed Mar 13, 2024

View reviewed changes

libs/community/langchain_community/vectorstores/azuresearch.py Show resolved Hide resolved

baskaryan merged commit a6cbb75 into langchain-ai:master Mar 26, 2024
59 checks passed

baskaryan mentioned this pull request Mar 26, 2024

FIX "metadata" hardcoded value in semantic_hybrid_search_with_score_and_rerank #19502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: fix semantic answer bug in AzureSearch vector store #18938

community: fix semantic answer bug in AzureSearch vector store #18938

Skar0 commented Mar 11, 2024

vercel bot commented Mar 11, 2024 •

edited

Loading

levalencia commented Mar 14, 2024

Skar0 commented Mar 18, 2024

Skar0 commented Mar 18, 2024

	# Get Semantic Answers
	semantic_answers = results.get_answers() or []
	semantic_answers_dict: Dict = {}
	for semantic_answer in semantic_answers:
	semantic_answers_dict[semantic_answer.key] = {
	"text": semantic_answer.text,
	"highlights": semantic_answer.highlights,
	}

community: fix semantic answer bug in AzureSearch vector store #18938

community: fix semantic answer bug in AzureSearch vector store #18938

Conversation

Skar0 commented Mar 11, 2024

vercel bot commented Mar 11, 2024 • edited Loading

levalencia commented Mar 14, 2024

Skar0 commented Mar 18, 2024

Skar0 commented Mar 18, 2024

vercel bot commented Mar 11, 2024 •

edited

Loading