Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

Open
jmoreno11 opened this issue Oct 31, 2024 · 2 comments

Comments

@jmoreno11
Copy link

jmoreno11 commented Oct 31, 2024

When moving from langchain_community to langchain_databricks, the VectorSearchClient is initialized inside the DatabricksVectorSearch class, which makes it convenient in many cases. However, when calling it many times, as is the case where many indexes are needed (for multi-index retrieval), it does take some time that ends up adding up (8.67s vs 6.44s for 5 indexes), aside from making it unnecessarily redundant.
It would be great it we could pass the initialized vector search client as an optional parameter to save some precious seconds there:

# - - - Suggestion - - - - 
# If vs_client is provided, use it; otherwise, initialize a new client
        if vs_client:
            self.index = vs_client.get_index(endpoint, index_name)
        else:
# - - - - - - - - - - - - - -
            try:
                from databricks.vector_search.client import VectorSearchClient
            except ImportError as e:
                raise ImportError(
                    "Could not import databricks-vectorsearch python package. "
                    "Please install it with `pip install databricks-vectorsearch`."
                ) from e
            self.index = VectorSearchClient().get_index(endpoint, index_name)
@jmoreno11
Copy link
Author

@B-Step62

@giancaire
Copy link

giancaire commented Nov 8, 2024

Hi! When we upgraded from langchain_community to langchain_databricks, we noticed that initializing VectorSearchClient using a Service Principal is no longer working. It currently defaults to using a Personal Access Token (PAT), and we’re getting the following notice:

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().

It would be very helpful if we could use Service Principal-based authentication directly with VectorSearchClient instead of a PAT. Ideally, we'd like to initialize it like this:

VectorSearchClient(
    workspace_url=DATABRICKS_HOST,
    service_principal_client_id=DATABRICKS_CLIENT_ID,
    service_principal_client_secret=DATABRICKS_CLIENT_SECRET,
)

Thank you so much for considering this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants