Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(community): Add support for SAP HANA Vector hnsw index creation and advanced filtering #7238

Merged
merged 9 commits into from
Dec 3, 2024
35 changes: 35 additions & 0 deletions docs/core_docs/docs/integrations/vectorstores/hanavector.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,47 @@ import ExampleLoader from "@examples/indexes/vector_stores/hana_vector/fromDocs.

<CodeBlock language="typescript">{ExampleLoader}</CodeBlock>

## Creating an HNSW Vector Index

A vector index can significantly speed up top-k nearest neighbor queries for vectors. Users can create a Hierarchical Navigable Small World (HNSW) vector index using the `create_hnsw_index` function.

For more information about creating an index at the database level, such as parameters requirement, please refer to the [official documentation](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/create-vector-index-statement-data-definition).

import ExampleIndex from "@examples/indexes/vector_stores/hana_vector/createHnswIndex.ts";

<CodeBlock language="typescript">{ExampleIndex}</CodeBlock>

## Basic Vectorstore Operations

import ExampleBasic from "@examples/indexes/vector_stores/hana_vector/basics.ts";

<CodeBlock language="typescript">{ExampleBasic}</CodeBlock>

## Advanced filtering

import { Table, Tr, Th, Td } from "@mdx-js/react";

In addition to the basic value-based filtering capabilities, it is possible to use more advanced filtering. The table below shows the available filter operators.

| Operator | Semantic |
| ---------- | -------------------------------------------------------------------------- |
| `$eq` | Equality (==) |
| `$ne` | Inequality (!=) |
| `$lt` | Less than (<) |
| `$lte` | Less than or equal (<=) |
| `$gt` | Greater than (>) |
| `$gte` | Greater than or equal (>=) |
| `$in` | Contained in a set of given values (in) |
| `$nin` | Not contained in a set of given values (not in) |
| `$between` | Between the range of two boundary values |
| `$like` | Text equality based on the "LIKE" semantics in SQL (using "%" as wildcard) |
| `$and` | Logical "and", supporting 2 or more operands |
| `$or` | Logical "or", supporting 2 or more operands |

import ExampleAdvancedFilter from "@examples/indexes/vector_stores/hana_vector/advancedFiltering.ts";

<CodeBlock language="typescript">{ExampleAdvancedFilter}</CodeBlock>

## Using a VectorStore as a retriever in chains for retrieval augmented generation (RAG)

import ExampleChain from "@examples/indexes/vector_stores/hana_vector/chains.ts";
Expand Down
210 changes: 210 additions & 0 deletions examples/src/indexes/vector_stores/hana_vector/advancedFiltering.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
import { OpenAIEmbeddings } from "@langchain/openai";
import hanaClient from "hdb";
import { Document } from "@langchain/core/documents";
import {
HanaDB,
HanaDBArgs,
} from "@langchain/community/vectorstores/hanavector";

const connectionParams = {
host: process.env.HANA_HOST,
port: process.env.HANA_PORT,
user: process.env.HANA_UID,
password: process.env.HANA_PWD,
};
const client = hanaClient.createClient(connectionParams);

// Connect to SAP HANA
await new Promise<void>((resolve, reject) => {
client.connect((err: Error) => {
if (err) {
reject(err);
} else {
console.log("Connected to SAP HANA successfully.");
resolve();
}
});
});

const docs: Document[] = [
{
pageContent: "First",
metadata: { name: "adam", is_active: true, id: 1, height: 10.0 },
},
{
pageContent: "Second",
metadata: { name: "bob", is_active: false, id: 2, height: 5.7 },
},
{
pageContent: "Third",
metadata: { name: "jane", is_active: true, id: 3, height: 2.4 },
},
];

// Initialize embeddings
const embeddings = new OpenAIEmbeddings();

const args: HanaDBArgs = {
connection: client,
tableName: "testAdvancedFilters",
};

// Create a LangChain VectorStore interface for the HANA database and specify the table (collection) to use in args.
const vectorStore = new HanaDB(embeddings, args);
// need to initialize once an instance is created.
await vectorStore.initialize();
// Delete already existing documents from the table
await vectorStore.delete({ filter: {} });
await vectorStore.addDocuments(docs);

// Helper function to print filter results
function printFilterResult(result: Document[]) {
if (result.length === 0) {
console.log("<empty result>");
} else {
result.forEach((doc) => console.log(doc.metadata));
}
}

let advancedFilter;

// Not equal
advancedFilter = { id: { $ne: 1 } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$ne":1}}
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 }
*/

// Between range
advancedFilter = { id: { $between: [1, 2] } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$between":[1,2]}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */

// In list
advancedFilter = { name: { $in: ["adam", "bob"] } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$in":["adam","bob"]}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */

// Not in list
advancedFilter = { name: { $nin: ["adam", "bob"] } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$nin":["adam","bob"]}}
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */

// Greater than
advancedFilter = { id: { $gt: 1 } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$gt":1}}
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */

// Greater than or equal to
advancedFilter = { id: { $gte: 1 } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$gte":1}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */

// Less than
advancedFilter = { id: { $lt: 1 } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$lt":1}}
<empty result> */

// Less than or equal to
advancedFilter = { id: { $lte: 1 } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$lte":1}}
{ name: 'adam', is_active: true, id: 1, height: 10 } */

// Text filtering with $like
advancedFilter = { name: { $like: "a%" } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$like":"a%"}}
{ name: 'adam', is_active: true, id: 1, height: 10 } */

advancedFilter = { name: { $like: "%a%" } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$like":"%a%"}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */

// Combined filtering with $or
advancedFilter = { $or: [{ id: 1 }, { name: "bob" }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$or":[{"id":1},{"name":"bob"}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */

// Combined filtering with $and
advancedFilter = { $and: [{ id: 1 }, { id: 2 }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$and":[{"id":1},{"id":2}]}
<empty result> */

advancedFilter = { $or: [{ id: 1 }, { id: 2 }, { id: 3 }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$or":[{"id":1},{"id":2},{"id":3}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */

// You can also define a nested filter with $and and $or.
advancedFilter = {
$and: [{ $or: [{ id: 1 }, { id: 2 }] }, { height: { $gte: 5.0 } }],
};
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$and":[{"$or":[{"id":1},{"id":2}]},{"height":{"$gte":5.0}}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */

// Disconnect from SAP HANA aft er the operations
client.disconnect();
98 changes: 98 additions & 0 deletions examples/src/indexes/vector_stores/hana_vector/createHnswIndex.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import hanaClient from "hdb";
import {
HanaDB,
HanaDBArgs,
} from "@langchain/community/vectorstores/hanavector";
import { OpenAIEmbeddings } from "@langchain/openai";

// table "test_fromDocs" is already created with the previous example.
// Now, we will use this existing table to create indexes and perform similarity search.

const connectionParams = {
host: process.env.HANA_HOST,
port: process.env.HANA_PORT,
user: process.env.HANA_UID,
password: process.env.HANA_PWD,
};
const client = hanaClient.createClient(connectionParams);

// Connect to SAP HANA
await new Promise<void>((resolve, reject) => {
client.connect((err: Error) => {
if (err) {
reject(err);
} else {
console.log("Connected to SAP HANA successfully.");
resolve();
}
});
});

// Initialize embeddings
const embeddings = new OpenAIEmbeddings();

// First instance using the existing table "test_fromDocs" (default: Cosine similarity)
const argsCosine: HanaDBArgs = {
connection: client,
tableName: "test_fromDocs",
};

// Second instance using the existing table "test_fromDocs" but with L2 Euclidean distance
const argsL2: HanaDBArgs = {
connection: client,
tableName: "test_fromDocs",
distanceStrategy: "euclidean", // Use Euclidean distance for this instance
};

// Initialize both HanaDB instances
const vectorStoreCosine = new HanaDB(embeddings, argsCosine);
const vectorStoreL2 = new HanaDB(embeddings, argsL2);

// Create HNSW index with Cosine similarity (default)
await vectorStoreCosine.createHnswIndex({
indexName: "hnsw_cosine_index",
efSearch: 400,
m: 50,
efConstruction: 150,
});

// Create HNSW index with Euclidean (L2) distance
await vectorStoreL2.createHnswIndex({
indexName: "hnsw_l2_index",
efSearch: 400,
m: 50,
efConstruction: 150,
});

// Query text for similarity search
const query = "What did the president say about Ketanji Brown Jackson";

// Perform similarity search using the default Cosine index
const docsCosine = await vectorStoreCosine.similaritySearch(query, 2);
console.log("Cosine Similarity Results:");
docsCosine.forEach((doc) => {
console.log("-".repeat(80));
console.log(doc.pageContent);
});
/*
Cosine Similarity Results:
----------------------------------------------------------------------
One of the most serious constitutional ...

And I did that 4 days ago, when I ...
----------------------------------------------------------------------
As I said last year, especially ...

While it often appears that we never agree, that isn’t true...
*/
// Perform similarity search using Euclidean distance (L2 index)
const docsL2 = await vectorStoreL2.similaritySearch(query, 2);
console.log("Euclidean (L2) Distance Results:");
docsL2.forEach((doc) => {
console.log("-".repeat(80));
console.log(doc.pageContent);
});
// The L2 distance results should be the same as cosine search results.

// Disconnect from SAP HANA after the operations
client.disconnect();
Loading