Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex after or during log compaction #199

Closed
4 of 10 tasks
staltz opened this issue Mar 25, 2022 · 15 comments
Closed
4 of 10 tasks

Reindex after or during log compaction #199

staltz opened this issue Mar 25, 2022 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@staltz
Copy link
Member

staltz commented Mar 25, 2022

async-append-only-log now supports log.compact() ssbc/async-append-only-log#48 and we should reindex the relevant portions of jitdb indexes. See also ssbc/ssb-db2#306

  • AAOL: compactionProgress() emits {done:true} on AAOL init
  • AAOL: compactionProgress() emits {done:false} immediately when compact starts
  • JITDB: if we start a query while log.compactionProgress() is undone, queue the query inputs
  • JITDB: once log.compactionProgress() emits "done" release the queue
  • JITDB: reindex() should rebuild core indexes too
  • JITDB: indexingActive() obz API
  • JITDB: queriesActive() obz API
  • DB2: internal boolean levelIndexingActive
  • DB2: compact() API which checks jitdb.indexingActive() and jitdb.queriesActive() and levelindexingActive and postpones itself until they are inactive
  • DB2: once indexing is inactive, run log.compact() and once that's done, reindex jitdb and then run a new log.stream to reindex leveldb
@staltz staltz added the enhancement New feature or request label Mar 25, 2022
@staltz staltz self-assigned this Mar 25, 2022
@staltz
Copy link
Member Author

staltz commented Mar 29, 2022

@arj03 I'm working on this already, but I bumped into a design issue. My original plan is to rebuild all indexes while log compaction is still ongoing, based on what Dominic suggested here:

Instead of writing a separate second log, superimpose the compacted log on top of the old one, keeping track of the min uncompacted sequence number. When you do a query from the old indexes, drop results with offsets smaller than the uncompacted sequence, and mix with results from the new indexes. merge index responses by sequence numbers, so numbers smaller than the uncompacted sequence are from the compacted section and the numbers greater are from the uncompacted log.

For instance, my plan is to create a folder .ssb/db2/indexes/.compacting where the new index files exist, and once those are done catching up with the log, we delete all jitdb indexes in .ssb/db2/indexes and move up the files in .ssb/db2/indexes/.compacting to .ssb/db2/indexes.

However, does this go against the JIT design of JITDB?

Alternatively, should we just delete all JITDB indexes once log compaction is done, and let the JIT nature take care of rebuilding them?

@arj03
Copy link
Member

arj03 commented Mar 29, 2022

Will get back to you on this tomorrow :)

@arj03
Copy link
Member

arj03 commented Mar 30, 2022

For JITDB I don't think we shouldn't allow queries while compaction is running, instead keep track of the offset from which you started compacting and on done call reindex() with that. This way if you compacted only the end of the log it should be quite fast. For level indexes in db2 it might make sense to allow lookups for things we know are before compaction. I do believe that we should have a "high-level" compact function in db2 that takes care of doing these things. Because it also needs to keep track of state while compacting, so that we can continue if the app is shut down while compacting.

This could be a starting point for the calls to jitdb & level indexes, it's a bit different because here we are running on individual messages.

@staltz
Copy link
Member Author

staltz commented Apr 1, 2022

@arj03 Thanks!

reindex() currently doesn't rebuild the core indexes (seq, sequence, timestamp), is there a reason for that? I think with compaction we would also have to rebuild the core indexes.

Why do you say that level indexes should be allowed to lookup while jitdb shouldn't? Is it because of speed of rebuilding the indexes? (By the way, in my tests, I felt that some jitdb indexes do take a considerable time to build, like value_author.32prefix, which is 6 MB in a production database, and has to write 4 bytes for each record.)

@arj03
Copy link
Member

arj03 commented Apr 1, 2022

reindex() currently doesn't rebuild the core indexes (seq, sequence, timestamp), is there a reason for that? I think with compaction we would also have to rebuild the core indexes.

Right, that is because for encrypted they don't change. Here they do :)

Why do you say that level indexes should be allowed to lookup while jitdb shouldn't?

Just because it is easier. We don't need to flag to jitdb that says: please don't update indexes, just run on what you have.

I can maybe check the value_author prefix building later later to see if there is anything we can do to optimize that one.

@staltz
Copy link
Member Author

staltz commented Apr 1, 2022

Okay thanks, so I'm preparing for the plan to "not allow queries while compaction is running", and here are some disorganized thoughts (mostly questions to myself):

What happens when a jitdb query is ongoing and compaction starts? Do we cancel the query, or do we pause it and re-run it after compaction (and reindexing) is done? What if it was a 4th call to paginate and the starting seq would now mean that the 4th paginate would get results that were in the 3rd paginate (or, worse, would skip some results).

This is a bit similar to questions about log.get and log.stream: do we abort it or do we wait for compaction to end and rerun it?

@staltz
Copy link
Member Author

staltz commented Apr 1, 2022

I can maybe check the value_author prefix building later later to see if there is anything we can do to optimize that one.

Just as a reference, here are other (leveldb and jitdb) indexes of similar size (sorted). There are plenty of big jitdb indexes, computing them all and writing to disk isn't fast.

Index Size Type
search2 101 MB level
ebt 21 MB level
contacts 12 MB level
timestamp 12 MB bitvector
value_content_vote_link__map 6,2 MB prefix map
seq 6,0 MB bitvector
sequence 6,0 MB bitvector
value_author 6,0 MB prefix
fullMentions 5,1 MB level
hashtags 5,0 MB level
aboutSelf 3,0 MB level
value_content_root__map 3,0 MB prefix map
value_content_contact__map 2,6 MB prefix map
base 1,3 MB level
value_content_fork__map 218 KB prefix map
meta_ 173 KB bitvector
meta_private_true 173 KB bitvector
value_author_MYSSBID 173 KB bitvector
value_content_root_ 173 KB bitvector
value_content_type_contact 173 KB bitvector
value_content_type_post 173 KB bitvector
value_content_type_pub 173 KB bitvector
value_content_type_roomx2Falias 173 KB bitvector
value_content_type_vote 173 KB bitvector
canDecrypt 24 KB bitvector
encrypted 64 B bitvector

@staltz
Copy link
Member Author

staltz commented Apr 1, 2022

What happens when a jitdb query is ongoing and compaction starts?

Answering myself: if there are ongoing queries (of any kind), postpone compaction, such that compaction always starts when everything else in the database is idle, and then from that point onwards, queue all incoming queries so that they apply only after compaction is done.

@arj03
Copy link
Member

arj03 commented Apr 1, 2022

Yep agree. Also for most of these indexes you wouldn't need to do a full rebuild. Only from compaction start and onwards.

@arj03
Copy link
Member

arj03 commented Apr 1, 2022

I felt that some jitdb indexes do take a considerable time to build, like value_author.32prefix, which is 6 MB in a production database, and has to write 4 bytes for each record.)

I had to check this. Building author or type is around the same amount of time (40s). Note this is from totally empty indexes folder, so includes keys and base. Rebuilding author after type is 6s. Rebuilding both are 45s. So a tiny bit less. I tried disabling writing the jitdb indexes and it only took rebuilding from 45s to 43.2s. Decrypting is still very heavy as documented here at around 10s overhead. If I leave the canDecrypt file and level indexes the time to build both goes down to 18.7s.

@staltz
Copy link
Member Author

staltz commented Apr 7, 2022

Super weird bug with GitHub where I can't edit the original post TODO list, so I'm copying it down here:

  • AAOL: compactionProgress() emits {done:true} on AAOL init
  • AAOL: compactionProgress() emits {done:false} immediately when compact starts
  • JITDB: if we start a query while log.compactionProgress() is undone, queue the query inputs
  • JITDB: once log.compactionProgress() emits "done" release the queue
  • JITDB: reindex() should rebuild core indexes too
  • JITDB: indexingActive() obz API
  • JITDB: queriesActive() obz API
  • DB2: reindexOffset should pass pValue to processRecord
  • DB2: internal levelIndexingActive() obz
  • DB2: compact() API which checks jitdb.indexingActive() and jitdb.queriesActive() and levelindexingActive and postpones itself until they are inactive
  • DB2: once indexing is inactive, run log.compact(), partially reindex jitdb, fully reindex leveldb, and partially reindex private
  • JITDB: improve performance of "get msg from seq" get just one result #215
  • DB2: improve performance of getMsg necessary for delete Performance of del() is pretty bad ssb-db2#334
  • AAOL: add versioning to the compaction state file
  • AAOL: what if compaction is done but crashes after the state file is deleted and before truncate starts?
  • DB2: update Obz to 1.1.0
  • AAOL: when restarting from a crash, compactionProgress needs to signal the original startOffset so we know where reindexing should start from
  • DB2: consider what happens if crashes happen at any point, and store write-ahead state somehow
  • JITDB: submit PR for indexingActive counter
  • DB2: we need to signal leveldb indexes to be "reset" if they have stuff in memory, e.g. contacts, aboutSelf, etc
  • DB2: submit compaction PR
  • EBT: we need to update or delete some entries after we delete msgs from the log. To keep it simple, let's suppose we only use sbot.db.deleteFeed and thus we delete that whole feedId from EBT too
  • What about browser compatibility for AAOL/compaction.js and DB2 post-compact files?

@staltz
Copy link
Member Author

staltz commented Apr 8, 2022

@arj03 I think I hit a pretty sad issue: it seems like we have to reset all leveldb indexes after compact happens, and that's because most leveldb indexes hold state, and we don't know how to undo the state only for the deleted records.

Consider e.g. the about-self index which has key/value feedId => hydratedAboutObj. Suppose we deleted an about message with the name field, but we still have an about message with the image field. There's nothing in this index that tells us how to "partially" delete the hydratedAboutObj, so either we get the wrong reduced state, or we have to do a full rebuild.

Any ideas about this?

@arj03
Copy link
Member

arj03 commented Apr 8, 2022

@staltz right, reduce based indexes versus pure. I think we could introduce that abstraction and then you'd only have to do a full reindex on the reduce based ones. Most base indexes are not reduce.

@staltz
Copy link
Member Author

staltz commented Apr 8, 2022

Yeah, we could do that split. But it might still be hard to find and remove the outdated entries. Like the EBT index, I think it's [feedId, sequence] => offset, we need to look up the old ones via the right-hand side, i.e. the value, not the key.

@arj03
Copy link
Member

arj03 commented Apr 11, 2022

Oh right, you are correct. I guess there isn't any other good way than to reindex everything in this case :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants