Skip to content

Commit

Permalink
feat: improve handling vector data and tooltip
Browse files Browse the repository at this point in the history
  • Loading branch information
invisal committed Jul 31, 2024
1 parent 4c59bcd commit cf29ae0
Show file tree
Hide file tree
Showing 11 changed files with 132 additions and 13 deletions.
50 changes: 50 additions & 0 deletions src/app/codemirror-override.css
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,53 @@
max-height: 600px;
user-select: text;
}

.cm-completionIcon-property::after {
content: "🆔" !important;
}

.cm-completionIcon-enum::after {
content: "❝" !important;
}

.cm-completionIcon-keyword::after {
content: "🔑" !important;
}

.cm-completionIcon-table::after {
display: block !important;
background-size: contain !important;
content: "" !important;
background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAABYgAAAWIBXyfQUwAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAAMBSURBVHic7ZvNTxNBGId/MzulLUWkQqQkRJKGGpUYo4k3MSYeDPEj8YDx6+LJRC/6H3g2MXIy8UyMB08a8cLBGGM0HiQeEEwJES20BaE00FZYZseDYIROW0JlX2DnufXdnd1nfu1+vJsuYPA2TFfsHlQ1LaOpawuSH1xSvMVtqf+JYE7SbzlDyWjkybMOtrh2+aoA7r5MR5N56+lg1nd83ubacLYrdT5HdTTYHyN19uWerpZvK/W/k7zVN3PlQ9LqzSxyi8TQJcJ+R56M2Dd6zjb1AssB3Hk12dU/HujL22xHfeulCAqlzkTyFx9eiDxn914r0T+WmZvI+wLUYm7SWmsXTreF63kiO/XAa5MHgETeF5zITt7nyZzVTS1DxURBXOKpgthLLUJFKieaedZmO/qsX45ZmwkulSdO/FqkYuDUEtSYAKgFqDEBUAtQYwKgFqBG1AlF7UAKGxge8XQCnj8ETADUAtR4PgChKzbvaUCoNlh2YC5fQHpmtqgeba38FD09nUGu8GtVLRQMoLkxXHHsaCJZVKvGVxtAMOBHfai27AYdx9HWK40DgJnsXFHNJ8S6xuqoxtfzh4AJgFqAGs8HoD0JKqXgOOXvkJXSL680rtTY9eyz3PY26mt6AWoBakwA1ALUeD4A0wvoVja9gIcwAVALUOP5AEwvYHoBj2MCoBagxvMBaK8Ckcbwuu6tU9OZojpFL1CNrzaAgL8GuypsUEqprVP0AtX4ev4QMAEAKHqNZKN3ZFuZEnNa4ACKzirZ+dymC7lNiTklBVN4rxja1q48PvkTTQ27ISz9f6lL/UooegHHUZBS/8BjSUpMzWZLBKDesc9D8fMOYy82tOftDmfnGAAMDI+8BXCCWMdt3hw90H6KA4AQuAqGNLWRWzAgxS11HVi+DB5ub/8ByToBDJGauQHDoJToPBKLJf58/Id4PO7PObitFLsJYD+J4ObxlTH1OMTxKBaLLawUS74u8unLaJtlYZ9kyu+O3+ZgKbYgJb4fOxQdo3YxbEV+A0V6Y/BBlijFAAAAAElFTkSuQmCC") !important;
width: 20px !important;
height: 20px !important;
background-position: center;
background-repeat: no-repeat;
}

.cm-completionIcon-function::after,
.cm-completionIcon-method::after,
.cm-completionIcon-variable::after,
.cm-completionIcon-namespace::after,
.cm-completionIcon-interface::after {
content: "⚡" !important;
}

.cm-tooltip-autocomplete > ul > li {
display: flex;
}

.cm-tooltip-autocomplete > ul > li > .cm-completionIcon {
width: 1em !important;
display: flex;
align-self: center;
justify-content: center;
}

.cm-tooltip-autocomplete .cm-completionLabel {
flex-grow: 1;
}

.cm-tooltip-autocomplete .cm-completionDetail {
padding-left: 15px;
}
28 changes: 16 additions & 12 deletions src/components/gui/table-cell/generic-cell.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -118,34 +118,35 @@ function BlobCellValue({
vector?: boolean;
}) {
if (vector) {
const floatArray = [...new Float32Array(new Uint8Array(value).buffer)].join(
", "
);
const floatArray = new Float32Array(new Uint8Array(value).buffer);
const floatArrayText = floatArray.join(", ");

return (
<div className="flex">
<div className="mr-2 justify-center items-center flex-col">
<span className="bg-blue-500 text-white inline rounded p-1 pl-2 pr-2">
vec
vec({floatArray.length})
</span>
</div>
<div className="text-orange-600">[{floatArray}]</div>
<div className="text-orange-600">[{floatArrayText}]</div>
</div>
);
} else {
const sliceByte = value.slice(0, 64);
const bytes = new Uint8Array(value);
const base64Text = btoa(
new Uint8Array(sliceByte).reduce(
(data, byte) => data + String.fromCharCode(byte),
""
)
bytes
.slice(0, 64)
.reduce((data, byte) => data + String.fromCharCode(byte), "")
);

return (
<div className="flex">
<div className="mr-2 justify-center items-center flex-col">
<span className="bg-blue-500 text-white inline rounded p-1 pl-2 pr-2">
blob
{bytes.length.toLocaleString(undefined, {
maximumFractionDigits: 0,
})}{" "}
bytes
</span>
</div>
<div className="text-orange-600">{base64Text}</div>
Expand Down Expand Up @@ -241,7 +242,10 @@ export default function GenericCell({
return (
<BlobCellValue
value={value}
vector={header.headerData?.type.includes("F32_BLOB")}
vector={
header.originalDataType?.includes("F32_BLOB") ||
header.originalDataType?.includes("FLOAT32 ")
}
/>
);
}
Expand Down
1 change: 1 addition & 0 deletions src/components/gui/table-optimized/OptimizeTableState.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ export default class OptimizeTableState {
return {
initialSize,
name: headerName ?? "",
originalDataType: header.originalType,
displayName: header.displayName,
resizable: true,
headerData,
Expand Down
1 change: 1 addition & 0 deletions src/components/gui/table-optimized/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ export interface OptimizeTableHeaderProps {
initialSize: number;
resizable?: boolean;
dataType?: TableColumnDataType;
originalDataType?: string | null;
headerData?: DatabaseTableColumn;
foreignKey?: DatabaseForeignKeyClause;
icon?: ReactElement;
Expand Down
20 changes: 20 additions & 0 deletions src/drivers/sqlite/function-tooltip.json
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@
"syntax": "length(X)",
"description": "<p>The length(X) function returns the character count of string X, excluding any NUL characters for strings (which SQLite typically lacks), or the byte count for blobs. If X is NULL, length(X) is also NULL. For numeric X, it returns the length of its string representation.</p>\n<pre><code>select length('hello');\n-&gt; 5\n\nselect length(x'ff00ee');\n-&gt; 3\n\nselect length(NULL);\n-&gt; NULL\n</code></pre>"
},
"libsql_vector_idx": {
"syntax": "libsql_vector_idx(X)",
"description": "<p>Use the <strong>libsql<em>vector</em>idx</strong> expression in the CREATE INDEX statement to create an ANN index.</p>\n<pre><code>CREATE INDEX movies_idx ON movies (libsql_vector_idx(embedding));\n</code></pre>"
},
"like": {
"syntax": "like(X,Y), like(X,Y,Z)",
"description": "<p>The like() function checks if the string Y matches the pattern X in the \"Y LIKE X [ESCAPE Z]\" expression.</p>\n<pre><code>select like('hel%', 'hello')\n-&gt; 1\n\nselect like('wor%', 'hello')\n-&gt; 0\n</code></pre>"
Expand Down Expand Up @@ -142,5 +146,21 @@
"total": {
"syntax": "total(X)",
"description": "<p>The function returns the sum of all non-NULL values in the group, with a result of 0.0 if there are no non-NULL inputs.\nThe result of total() is always a floating point value.</p>"
},
"vector": {
"syntax": "vector(X)",
"description": "<p>Function to convert a vector from string format to binary.</p>\n<pre><code>INSERT INTO movies (title, year, embedding) VALUES('Napoleon', 2023, vector('[1,2,3]'));\n</code></pre>"
},
"vector_distance_cos": {
"syntax": "vector_distance_cos(X, Y)",
"description": "<p>Function to calculate cosine distance between two vectors.\nIt computes the distance as 1 minus the cosine similarity,\nmeaning a smaller distance indicates closer vectors.</p>\n<pre><code>SELECT * FROM movie\nORDER BY vector_distance_cos(embedding, '[3,1,2]')\n</code></pre>"
},
"vector_extract": {
"syntax": "vector_extract(X)",
"description": "<p>Function to extract string from binary vector</p>\n<pre><code>SELECT title,\n vector_extract(embedding),\n vector_distance_cos(embedding, vector('[5,6,7]'))\nFROM movies;\n</code></pre>"
},
"vector_top_k": {
"syntax": "vector_top_k(idx_name, q_vector, k)",
"description": "<p>Use <strong>vector<em>top</em>k</strong> with the <strong>idx<em>name</strong> index to efficiently find the top k most similar vectors to <strong>q</em>vector</strong></p>\n<pre><code>SELECT title, year\nFROM vector_top_k('movies_idx', vector('[4,5,6]'), 3)\nJOIN movies ON movies.rowid = id\n</code></pre>"
}
}
7 changes: 7 additions & 0 deletions src/drivers/sqlite/functions/libsql_vector_idx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
libsql_vector_idx(X)

Use the **libsql_vector_idx** expression in the CREATE INDEX statement to create an ANN index.

```
CREATE INDEX movies_idx ON movies (libsql_vector_idx(embedding));
```
7 changes: 7 additions & 0 deletions src/drivers/sqlite/functions/vector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vector(X)

Function to convert a vector from string format to binary.

```
INSERT INTO movies (title, year, embedding) VALUES('Napoleon', 2023, vector('[1,2,3]'));
```
10 changes: 10 additions & 0 deletions src/drivers/sqlite/functions/vector_distance_cos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
vector_distance_cos(X, Y)

Function to calculate cosine distance between two vectors.
It computes the distance as 1 minus the cosine similarity,
meaning a smaller distance indicates closer vectors.

```
SELECT * FROM movie
ORDER BY vector_distance_cos(embedding, '[3,1,2]')
```
10 changes: 10 additions & 0 deletions src/drivers/sqlite/functions/vector_extract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
vector_extract(X)

Function to extract string from binary vector

```
SELECT title,
vector_extract(embedding),
vector_distance_cos(embedding, vector('[5,6,7]'))
FROM movies;
```
9 changes: 9 additions & 0 deletions src/drivers/sqlite/functions/vector_top_k.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
vector_top_k(idx_name, q_vector, k)

Use **vector_top_k** with the **idx_name** index to efficiently find the top k most similar vectors to **q_vector**

```
SELECT title, year
FROM vector_top_k('movies_idx', vector('[4,5,6]'), 3)
JOIN movies ON movies.rowid = id
```
2 changes: 1 addition & 1 deletion src/drivers/sqlite/sqlite-dialect.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { SQLDialect } from "@codemirror/lang-sql";
import sqliteFunctionList from "./function-tooltip.json";

export const SQLTypes =
"array binary bit boolean char character clob date decimal double float int integer interval large national nchar nclob numeric object precision real smallint time timestamp varchar varying ";
"array binary bit boolean char character clob date decimal double float int integer interval large national nchar nclob numeric object precision real smallint time timestamp varchar varying f32_blob float32";
export const SQLKeywords =
"absolute action add after all allocate alter and any are as asc assertion at authorization before begin between both breadth by call cascade cascaded case cast catalog check close collate collation column commit condition connect connection constraint constraints constructor continue corresponding count create cross cube current current_date current_default_transform_group current_transform_group_for_type current_path current_role current_time current_timestamp current_user cursor cycle data day deallocate declare default deferrable deferred delete depth deref desc describe descriptor deterministic diagnostics disconnect distinct do domain drop dynamic each else elseif end end-exec equals escape except exception exec execute exists exit external fetch first for foreign found from free full function general get global go goto grant group grouping handle having hold hour identity if immediate in indicator initially inner inout input insert intersect into is isolation join key language last lateral leading leave left level like limit local localtime localtimestamp locator loop map match method minute modifies module month names natural nesting new next no none not of old on only open option or order ordinality out outer output overlaps pad parameter partial path prepare preserve primary prior privileges procedure public read reads recursive redo ref references referencing relative release repeat resignal restrict result return returns revoke right role rollback rollup routine row rows savepoint schema scroll search second section select session session_user set sets signal similar size some space specific specifictype sql sqlexception sqlstate sqlwarning start state static system_user table temporary then timezone_hour timezone_minute to trailing transaction translation treat trigger under undo union unique unnest until update usage user using value values view when whenever where while with without work write year zone ";

Expand Down

0 comments on commit cf29ae0

Please sign in to comment.