Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestManager for partial replication #5

Merged
merged 94 commits into from
Feb 18, 2022
Merged

RequestManager for partial replication #5

merged 94 commits into from
Feb 18, 2022

Conversation

staltz
Copy link
Member

@staltz staltz commented Aug 17, 2021

  • RequestManager
  • Document the "query language" used in config.replicationScheduler
  • Batch the getSubset queries
  • Test that it all works with the new ssb-ebt
  • How do we configure sliced replication? Support sliced replication in partial replication mode #6
  • How do we configure for hops 0 differently than for hops 1 differently than for hops 2 etc?
  • Allow main feed to partially replicate (e.g. in the case of backup + recovery) itself?
  • Call ebt.registerFormat in index.js or somewhere, based on the config
  • Template data structure for easily querying stuff
  • Optimize performance of RequestManager traverse
  • Treat metafeed/tombstone quickly discovered after a metafeed/add
  • reconfigure() should trigger some recalculation on the new templates
  • Avoid rescanning the database in RequestManager scan()
  • Test for blocking meta feeds when a main feed is blocked
  • Tree datastructure in RequestManager
  • Blocking main feed must block all subfeeds related to main feed? Test for this too
  • Automatically switch to partial replication if (while replicating fully) we bump into metafeed/announce
  • start() method and autostart config
  • Handle tombstoning, TEST that we handle tombstoning
  • Fix block-related bug

There seems to be a bug related to blocking. Scenario was: A mutually follows B. A is connected to B. A follows C, but C blocks A. B has the data of C, but under EBT rules should not be allowed to send that data to A because it has to respect Cs block of A.

@arj03 This branch is truly a work in progress, but I figured you might have some comments on the direction I'm taking here. So far this just sets up the repo with placeholders to receive partial replication logic. If you have any comments, I'm keen to hear.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Aug 17, 2021

Notes from video call:

  • when no remote peer supports subset RPCs, fallback to createHistoryStream just to fetch msgs in memory, just to pick the announce msg
  • createHistoryStream is a temporary solution for peers who dont have the RPC
  • ssb-ebt comes in the end

  • We should replicate the index feed messages like normal feeds
  • And then, there will be a separate e-b-t instance where the arguments getAt / getClock / etc are customized so that it does an index lookup, and append does a sbot.db.addOOO
  • We would need a way of getting a remote peer's ssb-ebt clock without actually replicating it, i.e. we need some "initial metadata" event in ssb-ebt, in order to enable telling ebt that "I want to start replicating from latest - 50 until latest"
  • Lots of corner cases to handle, e.g. we need to be careful not to replicate Carla's main feed via old peers (that dont support partial replication) and then the other day replicating Carla's indexes feeds and doing addOOO on her main feed msgs.

  • By the way, the new ebt conundrum above only applies for index feeds
  • For chess feeds and others, we can just replicate them via normal ssb-ebt

@staltz
Copy link
Member Author

staltz commented Aug 19, 2021

Informed by this comment, I'm thinking that this PR needs a few different responsibilities:

Done? Responsibility Covered by
✔️ Managing the social graph ssb-friends
✔️ Replicating any feed ssb-ebt
Replicating an index feed upcoming ssb-ebt
✔️ Commanding EBT to replicate a feed This module
Managing the map of all main ID => root meta feed ID ?
Looking for the root meta feed ID for a main ID ?
Filtering the root meta feeds for relevant subfeeds This module
Commanding EBT to replicate an index feed This module

@arj03 What do you think of this?

These responsibilities smell like they belong elsewhere:

  • Looking for the root meta feed ID for a main ID
  • Managing the map of all main ID => root meta feed ID

But where? Maybe ssb-meta-feeds, but then ssb-meta-feeds would also have to call getSubset from ssb-meta-feeds-rpc and there may be a circular dependency between these two modules. (Not sure)

@arj03
Copy link
Member

arj03 commented Aug 20, 2021

That is a great overview, helped me clarify some thoughts. I think the responsibility of those 2 points should be in this module. ssb-meta-feeds should only concern itself with managing the meta feeds for your own feed I think and the other things one can use db2 queries for. There is also the thing of bulking some of the queries, like if you are doing initial sync and need to figure out the meta feed for 100 main feeds, then you don't want 100 calls over the network if one or a few can do. It might help to structure this module into overall replicator and then 1 subcomponents one for full replication and the other for partial replication. That way you can start doing the full replication component first and we can get that tested. This also matches our milestones.

@staltz
Copy link
Member Author

staltz commented Aug 20, 2021

I think the responsibility of those 2 points should be in this module.

Okay, I'll start with that, but I'll keep thinking about responsibility modularization.

There is also the thing of bulking some of the queries, like if you are doing initial sync and need to figure out the meta feed for 100 main feeds, then you don't want 100 calls over the network if one or a few can do.

Would this mean that we need to use ssb-ql-1 for getSubset?

@arj03
Copy link
Member

arj03 commented Aug 20, 2021

Would this mean that we need to use ssb-ql-1 for getSubset?

Yes

@staltz staltz force-pushed the partially branch 2 times, most recently from 3fe91fc to 7e20666 Compare August 23, 2021 10:59
@staltz
Copy link
Member Author

staltz commented Aug 23, 2021

@arj03 I updated this PR:

  • config.replicationScheduler.partially => config.replicationScheduler.partialReplication
  • Readme describes how to use the config
  • MetafeedFinder code is done, but totally untested

Table updated:

Done? Responsibility Covered by
✔️ Managing the social graph ssb-friends
✔️ Replicating any feed ssb-ebt
Replicating an index feed upcoming ssb-ebt
✔️ Commanding EBT to replicate a feed RequestManager
Commanding EBT to replicate an index feed RequestManager
Filtering the root meta feeds for relevant subfeeds RequestManager
Managing the map of all main ID => root meta feed ID MetafeedFinder
Looking for the root meta feed ID for a main ID MetafeedFinder

@arj03
Copy link
Member

arj03 commented Aug 23, 2021

I looked over the changes, this is looking good. As for the table and changes, I think its a very good idea as you have done, to try and make this initially about doing partial the correct way. Then we can start testing that and get some numbers of how it behaves and take the edge cases of doing partial replication with a non-meta-feed nodes.

@staltz
Copy link
Member Author

staltz commented Sep 9, 2021

@arj03 Could you take a look at commit 02eb512 ? It has the core logic for triggering the new EBT. Special attention to the big "FIXME" I put in the middle.

@staltz
Copy link
Member Author

staltz commented Sep 14, 2021

Table updated:

Done? Responsibility Covered by
✔️ Managing the social graph ssb-friends
✔️ Replicating any feed ssb-ebt
Replicating an index feed upcoming ssb-ebt
✔️ Commanding EBT to replicate a feed RequestManager
Commanding EBT to replicate an index feed RequestManager
Filtering the root meta feeds for relevant subfeeds RequestManager
Managing the map of all main ID => root meta feed ID MetafeedFinder
Looking for the root meta feed ID for a main ID MetafeedFinder

@staltz
Copy link
Member Author

staltz commented Sep 16, 2021

@arj03 I have a bug that's related to ssb-ebt but I don't want to put it on that branch because it's not a blocker for that PR, it's more of a "general problem" with the whole approach.

Indexed feed replication isn't working for me anymore, and I drilled down into the details and apparently the "feeds-lookup" in ssb-meta-feeds (which loads up state for findByIdSync) processes the index feed after ssb.ebt.request() for that feedId. So in ssb.ebt.request the findByIdSync will fail and it'll be treated as a classic feed, and then a few milliseconds after that, the correct state behind findByIdSync will be loaded. So a race condition, but we need a smart way of signalling that one of them is done so the other can proceed.

@staltz staltz changed the title work in progress request manager RequestManager for partial replication Sep 16, 2021
@staltz
Copy link
Member Author

staltz commented Sep 16, 2021

Table updated:

Done? Responsibility Covered by
✔️ Managing the social graph ssb-friends
✔️ Replicating classic feeds ssb-ebt
✔️ Replicating bendy butt feeds ssb-ebt
✔️ Replicating an index feed upcoming ssb-ebt
✔️ Commanding EBT to replicate a feed RequestManager
✔️ Commanding EBT to replicate an index feed RequestManager
Filtering the root meta feeds for relevant subfeeds RequestManager
✔️ Managing the map of all main ID => root meta feed ID MetafeedFinder
✔️ Looking for the root meta feed ID for a main ID MetafeedFinder

index.js Outdated Show resolved Hide resolved
req-manager.js Outdated Show resolved Hide resolved
metafeed-finder.js Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Sep 27, 2021

@arj03 This branch should be ready for review except for tombstoning.

index.js Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Sep 27, 2021

@arj03 Hmm, I just realized that maybe this module should have a start() method (and accompanying config.replicationScheduler.autostart boolean) because the app developer may need to first start up ssb-conn and then start replication scheduler, see e.g. how MetafeedFinder does an RPC on the currently connected peers.

metafeed-finder.js Outdated Show resolved Hide resolved
@arj03
Copy link
Member

arj03 commented Sep 27, 2021

@arj03 Hmm, I just realized that maybe this module should have a start() method (and accompanying config.replicationScheduler.autostart boolean) because the app developer may need to first start up ssb-conn and then start replication scheduler, see e.g. how MetafeedFinder does an RPC on the currently connected peers.

Good thinking, makes sense.

@staltz
Copy link
Member Author

staltz commented Feb 17, 2022

I want to use replicationScheduler's start() API which I realize this branch has already, but master doesn't, so I'm trying to push this PR forwards a tiny bit so we can finally merge it.

Tests seem to be failing even on the last commit where it passed, indicating that maybe something changed in our dependencies which may be causing test failure.

@staltz
Copy link
Member Author

staltz commented Feb 17, 2022

@arj03 Maybe I could use your help here because I think it's somehow related to ssb-db2 and epidemic-broadcast-trees. It's the strangest thing. If you go to main branch right now, do a clean npm install and npm test, the tests fail with ssb-db2 here: https://github.com/ssb-ngi-pointer/ssb-replication-scheduler/blob/294f252ab4c9a836e3abd3b2528139293a407e4a/test/integration/db2-support.js#L112-L117

The clocks show that the peers have not received each other's data, even though they are mutually following each other. I put debug logs in ssb-ebt and enabled logging in epidemic-broadcast-trees, and it seems that notes are sent to other peers, but ssb-ebt append() is never called.

On main branch it's using some older versions of ssb-ebt, but I tried updating to ssb-ebt 8 and the behavior was the same. Also, this partially branch behaves the same way (but shows even more errors because there are more tests).

On main branch, only the ssb-db2 test fails, all others with ssb-db are passing.

@staltz
Copy link
Member Author

staltz commented Feb 17, 2022

I even tried pinning the version of ssb-db2 to 2.2.0 (main branch has ^2.2.0) and got this strange behavior (see error below). Note that while ssb-db2 is pinned to 2.2.0, npm install picked the latest async-append-only-log at 3.1.3. Could it be that we introduced a bug in async-append-only-log recently?

  replicate between 3 peers, using ssb-db2

    ✔ started the 3 peers
(node:2493700) UnhandledPromiseRejectionWarning: 
TypeError: Cannot read property 'length' of null
    at Object.read [as decode] (./node_modules/varint/decode.js:12:15)
    at Object.seekKey (./node_modules/bipf/index.js:239:20)
    at Object.decrypt (./node_modules/ssb-db2/indexes/private.js:133:16)
    at Object.o.write (./node_modules/ssb-db2/log.js:69:57)
    at Stream._writeToSink (./node_modules/async-append-only-log/stream.js:85:33)
    at Stream._handleBlock (./node_modules/async-append-only-log/stream.js:120:12)
    at Stream._resumeCallback (./node_modules/async-append-only-log/stream.js:161:27)
    at Object.getBlock (./node_modules/async-append-only-log/index.js:146:7)
    at Stream._resume (./node_modules/async-append-only-log/stream.js:152:15)
    at Stream._next (./node_modules/looper/index.js:11:9)

@arj03
Copy link
Member

arj03 commented Feb 17, 2022

It does actually look like that. AAOL 3.1.2 seems to work for me, while AAOL 3.1.3 does not. It fails 3 tests.

@staltz
Copy link
Member Author

staltz commented Feb 17, 2022

Ouch, could it be that the database glitch in production is real because of AAOL 3.1.3? Good time to do that refactor on stream.js.

@arj03
Copy link
Member

arj03 commented Feb 17, 2022

It appears that manyverse is using 3.1.2 so just don't upgrade AAOL until we figure out what the problem is.

@staltz
Copy link
Member Author

staltz commented Feb 18, 2022

Yup! Tests pass again in main branch with AAOL 3.1.4 and also this branch partially. 👏

@staltz
Copy link
Member Author

staltz commented Feb 18, 2022

There seems to be a bug related to blocking. Scenario was: A mutually follows B. A is connected to B. A follows C, but C blocks A. B has the data of C, but under EBT rules should not be allowed to send that data to A because it has to respect Cs block of A.

@arj03 I added a test for this in the latest commit, so now this PR should have addressed all the cases.

There is still the concern on how to improve performance by avoiding the 500ms timeout, but we can tweak performance in the future as we go, doesn't need to be a dealbreaker for this PR, in my opinion.

@staltz staltz requested a review from arj03 February 18, 2022 12:59
@arj03
Copy link
Member

arj03 commented Feb 18, 2022

Ok, will have a look :)

package.json Outdated Show resolved Hide resolved
@staltz staltz requested a review from arj03 February 18, 2022 13:51
@arj03
Copy link
Member

arj03 commented Feb 18, 2022

I do remember reviewing most of this stuff, so I'm just trying to get back and making sure that we have everything. A question regarding the syntax of partialReplication. Is there a way to say you want to replicate all subfeeds?

@arj03
Copy link
Member

arj03 commented Feb 18, 2022

To add some more context: the above would able pubs to replicate all subfeeds and it would mean I could use this module in groupies. It is not a blocker on this PR, just something that would be good to have.

I think that is my only overall comment. We tested this to figure out performance and I can you you fixed the things we found there so seems like this is ready for merge :)

Copy link
Member

@arj03 arj03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wup

@staltz
Copy link
Member Author

staltz commented Feb 18, 2022

I do remember reviewing most of this stuff, so I'm just trying to get back and making sure that we have everything. A question regarding the syntax of partialReplication. Is there a way to say you want to replicate all subfeeds?

Just a sec, I'm making a test case to answer your question. Basically, yes this is supported in one way or another. It's a bit quirky (do we need a "replication language" soon?) but the support exists.

@staltz
Copy link
Member Author

staltz commented Feb 18, 2022

@arj03 See latest commit. Once CI tests pass, I'll merge and make a new major version. 🎉

@arj03
Copy link
Member

arj03 commented Feb 18, 2022

do we need a "replication language" soon

😁

Glad to know it exists.

Great work on this PR, it is quite a task getting these concurrently things to line up correctly.

@staltz staltz merged commit 06119fc into main Feb 18, 2022
@staltz
Copy link
Member Author

staltz commented Feb 18, 2022

Wow, what a huge PR this was!

@staltz staltz deleted the partially branch February 18, 2022 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants