Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Iterator.seek(int next) as a basic building block for faster composition and transitive closure #37

Open
jurgenvinju opened this issue Jan 12, 2024 · 1 comment

Comments

@jurgenvinju
Copy link
Member

The seek method is described as follows, here https://arxiv.org/abs/1210.0481 :

seek(int seekKey): Position the iterator at a least
upper bound for seekKey,
i.e. the least key ≥ seekKey, or
move to end if no such key exists.
The sought key must be ≥ the
key at the current position.

Using this building block algorithms for trie compose ("join") and closure can make use of the orderedness of the hash keys:

  • already the partial order on elements allows for skipping certain commutative orderings, this may shave off half of an iteration of the smallest trie
  • the full order on hash codes (up to collisions) allows for skipping ahead to the relevant part of the other iterator. The sparser the match between two relations, the faster the algorithm will go.

This seek method can be used in many applications and it abstracts from the internal details of the data-structure. All it depends on is the hashCode/equals contract. Most implementation in capsule will use the structure of the trie to make sure seek is done as quickly as possible. It would be best if we implement seek for all tries in capsule, IMHO.

@jurgenvinju
Copy link
Member Author

Caveat: If our iterators go in the other direction (I don't know) wrt to hash code order, then of course the definition of seek should be mirrored as well.

If our iterators do not respect hash code order, then we have to think carefully about the semantics of seek. Is it still correct to assume we can skip large parts of the trie, even though the order is non-standard?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant