Figure out incremental re-indexing #19

simonw · 2020-09-08T05:23:31Z

As tables get bigger reindexing everything on a schedule (essentially recreating the entire index from scratch) will start to become a performance bottleneck.

simonw · 2020-09-08T05:24:50Z

I thought about allowing tables to define a incremental indexing SQL query - maybe something that can return just records touched in the past hour, or records since a recorded "last indexed record" value.

The problem with this is deletes - if you delete a record, how does the indexer know to remove it? See #18 - that's already caused problems.

simonw · 2020-09-08T05:27:07Z

A really clever way to do this would be with triggers. The indexer script would add triggers to each of the database tables that it is indexing - each in their own database.

Those triggers would then maintain a _index_queue_ table. This table would record the primary key of rows that are added, modified or deleted. The indexer could then work by reading through the _index_queue_ table, re-indexing (or deleting) just the primary keys listed there, and then emptying the queue once it has finished.

This would add a small amount of overhead to insert/update/delete queries run against the table. My hunch is that the overhead would be miniscule, but I could still allow people to opt-out for tables that are so high traffic that this would matter.

simonw added the enhancement New feature or request label Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out incremental re-indexing #19

Figure out incremental re-indexing #19

simonw commented Sep 8, 2020

simonw commented Sep 8, 2020

simonw commented Sep 8, 2020

Figure out incremental re-indexing #19

Figure out incremental re-indexing #19

Comments

simonw commented Sep 8, 2020

simonw commented Sep 8, 2020

simonw commented Sep 8, 2020