Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

jmoreno11 · 2024-10-31T09:50:13Z

When moving from langchain_community to langchain_databricks, the VectorSearchClient is initialized inside the DatabricksVectorSearch class, which makes it convenient in many cases. However, when calling it many times, as is the case where many indexes are needed (for multi-index retrieval), it does take some time that ends up adding up (8.67s vs 6.44s for 5 indexes), aside from making it unnecessarily redundant.
It would be great it we could pass the initialized vector search client as an optional parameter to save some precious seconds there:

# - - - Suggestion - - - - 
# If vs_client is provided, use it; otherwise, initialize a new client
        if vs_client:
            self.index = vs_client.get_index(endpoint, index_name)
        else:
# - - - - - - - - - - - - - -
            try:
                from databricks.vector_search.client import VectorSearchClient
            except ImportError as e:
                raise ImportError(
                    "Could not import databricks-vectorsearch python package. "
                    "Please install it with `pip install databricks-vectorsearch`."
                ) from e
            self.index = VectorSearchClient().get_index(endpoint, index_name)

The text was updated successfully, but these errors were encountered:

jmoreno11 · 2024-11-05T11:23:54Z

@B-Step62

giancaire · 2024-11-08T00:45:44Z

Hi! When we upgraded from langchain_community to langchain_databricks, we noticed that initializing VectorSearchClient using a Service Principal is no longer working. It currently defaults to using a Personal Access Token (PAT), and we’re getting the following notice:

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().

It would be very helpful if we could use Service Principal-based authentication directly with VectorSearchClient instead of a PAT. Ideally, we'd like to initialize it like this:

VectorSearchClient(
    workspace_url=DATABRICKS_HOST,
    service_principal_client_id=DATABRICKS_CLIENT_ID,
    service_principal_client_secret=DATABRICKS_CLIENT_SECRET,
)

Thank you so much for considering this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

jmoreno11 commented Oct 31, 2024 •

edited

Loading

jmoreno11 commented Nov 5, 2024

giancaire commented Nov 8, 2024 •

edited

Loading

Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

Allow VectorSearchClient to be passed to DatabricksVectorSearch optionally #33

Comments

jmoreno11 commented Oct 31, 2024 • edited Loading

jmoreno11 commented Nov 5, 2024

giancaire commented Nov 8, 2024 • edited Loading

jmoreno11 commented Oct 31, 2024 •

edited

Loading

giancaire commented Nov 8, 2024 •

edited

Loading