-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
60 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -347,3 +347,56 @@ CREATE TABLE "public"."index_parameter_experiment_results" ( | |
build_time DOUBLE PRECISION NULL | ||
); | ||
``` | ||
|
||
## Lantern PQ | ||
|
||
## Description | ||
|
||
Use external product quantization to compress table vectors using kmeans clustering. | ||
|
||
### Usage | ||
|
||
Run `lantern-cli pq-table --help` to show the cli options. | ||
|
||
Job can be run both on local instance and also using GCP batch jobs to parallelize the workload over handreds of VMs to speed up clustering. | ||
|
||
To run locally use: | ||
|
||
```bash | ||
lantern-cli pq-table --uri 'postgres://[email protected]:5432/postgres' --table sift10k --column v --clusters 256 --splits 32 | ||
``` | ||
|
||
The job will be run on current machine utilizing all available cores. | ||
|
||
For big datasets over 1M it is convinient to run the job using GCP batch jobs. | ||
Make sure to have GCP credentials set-up before running this command: | ||
|
||
```bash | ||
lantern-cli pq-table --uri 'postgres://[email protected]:5432/postgres' --table sift10k --column v --clusters 256 --splits 32 --run-on-gcp | ||
``` | ||
|
||
If you prefer to orchestrate task on your own on premise servers you need to do the following 3 steps: | ||
|
||
1. Run setup job. This will create necessary tables and add `pqvec` column on target table | ||
|
||
```bash | ||
lantern-cli pq-table --uri 'postgres://[email protected]:5432/postgres' --table sift10k --column v --clusters 256 --splits 32 --skip-codebook-creation --skip-vector-compression | ||
``` | ||
|
||
2. Run clustering job. This will create codebook for the table and export to postgres table | ||
|
||
```bash | ||
lantern-cli pq-table --uri 'postgres://[email protected]:5432/postgres' --table sift10k --column v --clusters 256 --splits 32 --skip-table-setup --skip-vector-compression --parallel-task-count 10 --subvector-id 0 | ||
``` | ||
|
||
In this case this command should be run 32 times for each subvector in range [0-31] and `--parallel-task-count` means at most we will run 10 tasks in parallel. This is used to not exceed max connection limit on postgres. | ||
|
||
3. Run compression job. This will compress vectors using the generated codebook and export results under `pqvec` column | ||
|
||
```bash | ||
lantern-cli pq-table --uri 'postgres://[email protected]:5432/postgres' --table sift10k --column v --clusters 256 --splits 32 --skip-table-setup --skip-codebook-creation --parallel-task-count 10 --total-task-count 10 --compression-task-id 0 | ||
``` | ||
|
||
In this case this command should be run 10 times for each part of codebook in range [0-9] and `--parallel-task-count` means at most we will run 10 tasks in parallel. This is used to not exceed max connection limit on postgres. | ||
|
||
Table should have primary key, in order for this job to work. If primary key is different than `id` provide it using `--pk` argument |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters