Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out incremental re-indexing #19

Open
simonw opened this issue Sep 8, 2020 · 2 comments
Open

Figure out incremental re-indexing #19

simonw opened this issue Sep 8, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Collaborator

simonw commented Sep 8, 2020

As tables get bigger reindexing everything on a schedule (essentially recreating the entire index from scratch) will start to become a performance bottleneck.

@simonw simonw added the enhancement New feature or request label Sep 8, 2020
@simonw
Copy link
Collaborator Author

simonw commented Sep 8, 2020

I thought about allowing tables to define a incremental indexing SQL query - maybe something that can return just records touched in the past hour, or records since a recorded "last indexed record" value.

The problem with this is deletes - if you delete a record, how does the indexer know to remove it? See #18 - that's already caused problems.

@simonw
Copy link
Collaborator Author

simonw commented Sep 8, 2020

A really clever way to do this would be with triggers. The indexer script would add triggers to each of the database tables that it is indexing - each in their own database.

Those triggers would then maintain a _index_queue_ table. This table would record the primary key of rows that are added, modified or deleted. The indexer could then work by reading through the _index_queue_ table, re-indexing (or deleting) just the primary keys listed there, and then emptying the queue once it has finished.

This would add a small amount of overhead to insert/update/delete queries run against the table. My hunch is that the overhead would be miniscule, but I could still allow people to opt-out for tables that are so high traffic that this would matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant