-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure AI Search, metadata field is required and hardcoded in langchain community #18731
Comments
Hello @levalencia 😃 I have taken a look at the code and did some tests with my own index, and it indeed seems like the error you are encountering is due to the following line.
I have created a PR #18938 with a bit more context on what the bug is, where it comes from, and how I (hopefully) fixed it. It would be nice if you can test and confirm! |
I'm running into a similar issue as well.
However in my case all I'm trying to do is add my documents to the index with add_texts or add_documents, and this is when I receive:
Should I open a new related issue for this? |
The PR you reference is changing from 'metadata' to |
Do you create the index using the AzureSearch object ? If so, I think a "metadata" field is created by default in the index definition. You can however decide to define an index yourself |
Thanks for the reply. |
Also encountering this issue - and have the same set of requirements as @thelazydogsback |
…#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** #18731 - **Prior fix to this bug:** This bug was fixed in this PR #15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR #15659 (see ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [#1564](#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
…langchain-ai#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** langchain-ai#18731 - **Prior fix to this bug:** This bug was fixed in this PR langchain-ai#15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR langchain-ai#15659 (see langchain-ai@ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [langchain-ai#1564](langchain-ai#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
…#18938) - **Description:** The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - **Issue:** #18731 - **Prior fix to this bug:** This bug was fixed in this PR #15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR #15659 (see ed1ffca). @lz-chen: could you confirm this wasn't intentional? - **New fix to this bug:** I believe there was an oversight in the logic of the fix from [#1564](#15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. https://github.com/langchain-ai/langchain/blob/5c2f7e6b2b474248af63a5f0f726b1414c5467c8/libs/community/langchain_community/vectorstores/azuresearch.py#L574-L581 The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!
Checked other resources
Example Code
Custom Retriever Code
setup langchain chain,llm
My fields
Error Message and Stack Trace (if applicable)
The error is thown in this line:
When I dig deep in the langchain code, I found this code:
As you can see in the last line, its trying to find a metadata field on the search results, which we dont have as our index is customized with our own fields.
I am blaming this line:
langchain/libs/community/langchain_community/vectorstores/azuresearch.py
Line 607 in ced5e7b
@Skar0 , not sure if this is really a bug, or I missed something in the documentation.
Description
I am trying to use langchain with Azure OpenAI and Azure Search as Vector Store, and a custom retriever. I dont have a metadata field
This was working with a previous project with azure-search-documents==11.4.b09
but in a new project I am trying azure-search-documents ==11.4.0
System Info
langchain==0.1.7
langchain-community==0.0.20
langchain-core==0.1.23
langchain-openai==0.0.6
langchainhub==0.1.14
The text was updated successfully, but these errors were encountered: