Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureSearch Oauth with azure_ad_access_token not working: TokenCredential cannot be instantiated #26216

Closed
5 tasks done
clairebehue opened this issue Sep 9, 2024 · 4 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature investigate Flagged for investigation.

Comments

@clairebehue
Copy link
Contributor

clairebehue commented Sep 9, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The following code fails to instantiate an AzureSearch instance:

from azure.core.credentials import AzureKeyCredential
from azure.identity import ClientSecretCredential

from langchain_community.vectorstores.azuresearch import AzureSearch

# setup your connection params:
DEFAULT_OAUTH_SCOPE = "https://search.azure.com/.default"
TENANTID = "..."   # Your company tenant id in Azure
APPID = "..."  # Your entreprise app id from Microsoft entra id
APPSECRET = "... "  # the secret value of a secret credential attached with your app
SEARCH_SERVICE_ENDPOINT= ".." #  Azure AI Search URL of the service to connect to (default URL is https://RESOURCE_NAME.search.windows.net)
embeddings = ... # replace with an instance of langchain.embeddings.base.Embeddings

# retrieve an access token from azure client:
azure_credential = ClientSecretCredential(tenant_id=TENANTID, client_id=APPID, client_secret=APPSECRET)
access_token = azure_credential.get_token(DEFAULT_OAUTH_SCOPE)

# Try to use the access_token obtained to instantiate an AzureSearch object ==> fails
db = AzureSearch(
    azure_search_endpoint= SEARCH_SERVICE_ENDPOINT,
    index_name="indexname",
    embedding_function=embeddings,
    azure_ad_access_token=access_token,
    azure_search_key=None
)

Error Message and Stack Trace (if applicable)

    db = AzureSearch(
File ".../python3.9/site-packages/langchain_community/vectorstores/azuresearch.py", line 335, in __init__
   self.client = _get_search_client(
File ".../python3.9/site-packages/langchain_community/vectorstores/azuresearch.py", line 134, in _get_search_client
   credential = TokenCredential(
File ".../python3.9/site-packages/typing_extensions.py", line 551, in _no_init
   raise TypeError('Protocols cannot be instantiated')
TypeError: Protocols cannot be instantiated

Description

instantiating AzureSearch with azure_ad_access_token != None is trying to create an instance of TokenCredential which is an interface and not instantiable. (see https://python.langchain.com/v0.2/api_reference/_modules/langchain_community/vectorstores/azuresearch.html#AzureSearch.__init__ )

Additional Notes:

  • Oauth for azure ai search comes from this PR: (which also introduce the issue)
  • the workaround to using Oauth with azure ai search with langchain is to set both azure_search_key & azure_ad_access_token to None but setting the following env var: (authentification will then use the from azure.identity import DefaultAzureCredential class. see Azure doc about this
    os.environ["AZURE_TENANT_ID"] = TENANTID
    os.environ["AZURE_CLIENT_ID"] = APPID
    os.environ["AZURE_CLIENT_SECRET"] = APPSECRET

System Info

langchain-community==0.2.16
platform: mac
python 3.9

@langcarl langcarl bot added the investigate Flagged for investigation. label Sep 9, 2024
@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 9, 2024
@gavinbarron
Copy link

This also fails in the same manner when attempting to use Managed Service Identity, which is preferable from a security perspective as it removes the weakness of persistent secrets which could be compromised and used by a malicious actor.

For reference the issue also exists on:
langchain-community==0.3.0
platform: linux
python 3.12.5

efriis added a commit that referenced this issue Dec 16, 2024
**Description:** 
AzureSearch vector store: create a wrapper class on
`azure.core.credentials.TokenCredential` (which is not-instantiable) to
fix Oauth usage with `azure_ad_access_token` argument

**Issue:** [the issue it
fixes](#26216)

 **Dependencies:** None

- [x] **Lint and test**

---------

Co-authored-by: Erick Friis <[email protected]>
@ianchi
Copy link
Contributor

ianchi commented Dec 19, 2024

Hi @efriis the fix introduced will most probably fail in the async scenario, as the asyncSearchClient expects an AsyncTokenCredential (one with an async version of get_token).
Thus, you cannot use the same credentials for both sync and async clients.

Additionally, the approach of providing an ad_token has a limitation on long running processes where the token will expire and the only way is to recreate from outside the whole AzureSearch object.
The exceptions if you leave key and token to None and it falls back to DefaultAzureCredentials and internally the SearchClient manages the renewal.

But there are some cases where DefaultAzureCredentials doesn't work and you need to use a special configuration to get the credentials.
Wouldn't it be possible to pass in directly the credentials to be used, as an alternative paramanter (azure_credentials)?

The problem with sync/async will also be present in the fallback case and in this new one

@efriis
Copy link
Member

efriis commented Dec 20, 2024

would recommend using the sync implementation for now; if you find a failure feel free to contribute a unit test showing the issue, as well as a fix!

in general AzureSearch should be migrated to https://github.com/langchain-ai/langchain-azure

@efriis efriis closed this as completed Dec 20, 2024
@ianchi
Copy link
Contributor

ianchi commented Dec 22, 2024

Hi @efriis, I sent PR #28873 to generalize the possibility to connect using any kind of credentials, that also address the token renewal and async.

I also made the key parameter optional as it was not possible to pass None without generating linting error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature investigate Flagged for investigation.
Projects
None yet
Development

No branches or pull requests

4 participants