Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Changed crystallization strategy in btrees to rely on coalescing
This is a pretty big rewrite, but is necessary to avoid "dagging". "Dagging" (I just made this term up) is when you transform a pure tree into a directed acyclic graph (DAG). Normally DAGs are perfectly fine in a copy-on-write system, but in littlefs's cases, it creates havoc for future block allocator plans, and it's interaction with parity blocks raises some uncomfortable questions. How does dagging happen? Consider an innocent little btree with a single block: .-----. |btree| | | '-----' | v .-----. |abcde| | | '-----' Say we wanted to write a small amount of data in the middle of our block. Since the data is so small, the previous scheme would simply inline the data, carving the left and right sibling (in the case the same block) to make space: .-----. |btree| | | '-----' .' v '. | c' | '. .' v v .-----. |ab de| | | '-----' Oh no! A DAG! With the potential for multiple pointers to reference the same block in our btree, some invariants break down: - Blocks no longer have a single reference - If you remove a reference you can no longer assume the block is free - Knowing when a block is free requires scanning the whole btree - This split operation effectively creates two blocks, does that mean we need to rewrite parity blocks? --- To avoid this whole situation, this commit adopts a new crystallization algorithm. Instead of allowing crystallization data to be arbitrarily fragmented, we eagerly coalesce any data under our crystallization threshold, and if we can't coalesce, we compact everything into a block. Much like a Knuth heap, simply checking both siblings to coalesce has the effect that any data will always coalesce up to the maximum size where possible. And when checking for siblings, we can easily find the block alignment. This also has the effect of always rewriting blocks if we are writing a small amount of data into a block. Unfortunately I think this is just necessary in order to avoid dagging. At the very least crystallization is still useful for files not quite block aligned at the edges, and sparse files. This also avoids concerns of random writes inflating a file via sparse crystallization.
- Loading branch information