Smarter bulk & unordered scans #43

ashvardanian · 2022-08-28T22:09:35Z

Currently ukv_scan is only working for fully consistent sorted exported of keys from collections.
With the bulk flag we allow prioritizing throughput over consistency, but a point can be made, that ML-like pipelines don’t need any dependency in operations whatsoever. Instead they may use scans to uniformly random-sample entries, which would in turn require a full scan of keys. If the user leaves start_key unset, we can perform the bulk sampling behind the curtains ourselves.
It will make the interface more ugly by making a function dual-use, but will keep the interface short. Worth considering.

The text was updated successfully, but these errors were encountered:

ashvardanian · 2022-08-28T22:13:58Z

Those changes should preceed #17 to have a finalized scan interface.

Fix: Python build with new scans #43 Fix: retrieving the gist of document fields.

ashvardanian · 2022-10-18T12:47:01Z

If the bulk flag is provided, we can treat the passed keys as not start keys but instead the last keys in the previous batch.

Fix: Python build with new scans #43 Fix: retrieving the gist of document fields.

ashvardanian self-assigned this Aug 28, 2022

ashvardanian added a commit that referenced this issue Sep 5, 2022

Add: joined and embedded ranges.

6b59392

Fix: Python build with new scans #43 Fix: retrieving the gist of document fields.

VioletaStepanyan mentioned this issue Sep 5, 2022

Code refactoring, add test. #49

Merged

ashvardanian added this to the 0.4.0 milestone Oct 3, 2022

ashvardanian added this to UKV Community Edition Oct 3, 2022

ashvardanian changed the title ~~Smarter scans~~ Smarter bulk & unordered scans Oct 18, 2022

DarvinHarutyunyan pushed a commit that referenced this issue Dec 9, 2022

Add: joined and embedded ranges.

acb56dd

Fix: Python build with new scans #43 Fix: retrieving the gist of document fields.

DarvinHarutyunyan pushed a commit that referenced this issue Dec 9, 2022

Refactoring: Implement ukv_scan(...) for backend_leveldb. #43

0264f7d

DarvinHarutyunyan pushed a commit that referenced this issue Dec 9, 2022

Refactoring: Implement ukv_scan(...) for backend_rocksdb. #43

44b5492

ashvardanian modified the milestones: 0.4.0, 0.5.0 Dec 10, 2022

ashvardanian modified the milestones: v0.5: Snapshots, NetworkX, Vector Search, v0.6: Sampling, Replication, Schema Validation, JS Jan 21, 2023

ashvardanian removed their assignment Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter bulk & unordered scans #43

Smarter bulk & unordered scans #43

ashvardanian commented Aug 28, 2022

ashvardanian commented Aug 28, 2022

ashvardanian commented Oct 18, 2022

Smarter bulk & unordered scans #43

Smarter bulk & unordered scans #43

Comments

ashvardanian commented Aug 28, 2022

ashvardanian commented Aug 28, 2022

ashvardanian commented Oct 18, 2022