-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex after or during log compaction #199
Comments
@arj03 I'm working on this already, but I bumped into a design issue. My original plan is to rebuild all indexes while log compaction is still ongoing, based on what Dominic suggested here:
For instance, my plan is to create a folder However, does this go against the JIT design of JITDB? Alternatively, should we just delete all JITDB indexes once log compaction is done, and let the JIT nature take care of rebuilding them? |
Will get back to you on this tomorrow :) |
For JITDB I don't think we shouldn't allow queries while compaction is running, instead keep track of the offset from which you started compacting and on done call This could be a starting point for the calls to jitdb & level indexes, it's a bit different because here we are running on individual messages. |
@arj03 Thanks!
Why do you say that level indexes should be allowed to lookup while jitdb shouldn't? Is it because of speed of rebuilding the indexes? (By the way, in my tests, I felt that some jitdb indexes do take a considerable time to build, like |
Right, that is because for encrypted they don't change. Here they do :)
Just because it is easier. We don't need to flag to jitdb that says: please don't update indexes, just run on what you have. I can maybe check the value_author prefix building later later to see if there is anything we can do to optimize that one. |
Okay thanks, so I'm preparing for the plan to "not allow queries while compaction is running", and here are some disorganized thoughts (mostly questions to myself): What happens when a jitdb query is ongoing and compaction starts? Do we cancel the query, or do we pause it and re-run it after compaction (and reindexing) is done? What if it was a 4th call to paginate and the starting This is a bit similar to questions about log.get and log.stream: do we abort it or do we wait for compaction to end and rerun it? |
Just as a reference, here are other (leveldb and jitdb) indexes of similar size (sorted). There are plenty of big jitdb indexes, computing them all and writing to disk isn't fast.
|
Answering myself: if there are ongoing queries (of any kind), postpone compaction, such that compaction always starts when everything else in the database is idle, and then from that point onwards, queue all incoming queries so that they apply only after compaction is done. |
Yep agree. Also for most of these indexes you wouldn't need to do a full rebuild. Only from compaction start and onwards. |
I had to check this. Building author or type is around the same amount of time (40s). Note this is from totally empty indexes folder, so includes keys and base. Rebuilding author after type is 6s. Rebuilding both are 45s. So a tiny bit less. I tried disabling writing the jitdb indexes and it only took rebuilding from 45s to 43.2s. Decrypting is still very heavy as documented here at around 10s overhead. If I leave the canDecrypt file and level indexes the time to build both goes down to 18.7s. |
Super weird bug with GitHub where I can't edit the original post TODO list, so I'm copying it down here:
|
@arj03 I think I hit a pretty sad issue: it seems like we have to reset all leveldb indexes after compact happens, and that's because most leveldb indexes hold state, and we don't know how to undo the state only for the deleted records. Consider e.g. the about-self index which has key/value Any ideas about this? |
@staltz right, reduce based indexes versus pure. I think we could introduce that abstraction and then you'd only have to do a full reindex on the reduce based ones. Most base indexes are not reduce. |
Yeah, we could do that split. But it might still be hard to find and remove the outdated entries. Like the EBT index, I think it's |
Oh right, you are correct. I guess there isn't any other good way than to reindex everything in this case :( |
async-append-only-log now supports
log.compact()
ssbc/async-append-only-log#48 and we should reindex the relevant portions of jitdb indexes. See also ssbc/ssb-db2#306compactionProgress()
emits{done:true}
on AAOL initcompactionProgress()
emits{done:false}
immediately when compact startslog.compactionProgress()
is undone, queue the query inputslog.compactionProgress()
emits "done" release the queuereindex()
should rebuild core indexes tooindexingActive()
obz APIqueriesActive()
obz APIlevelIndexingActive
compact()
API which checksjitdb.indexingActive()
andjitdb.queriesActive()
andlevelindexingActive
and postpones itself until they are inactivelog.compact()
and once that's done, reindex jitdb and then run a new log.stream to reindex leveldbThe text was updated successfully, but these errors were encountered: