-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delayed RPC Send Using Tokens #5923
base: unstable
Are you sure you want to change the base?
Conversation
29e3f00
to
90361d6
Compare
90361d6
to
7e0c630
Compare
# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs
…me protocol per peer
# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs
# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs
I did not understand this. If we get a request that breaks the concurrent limit, why can't we just wait to send that until the existing streams have concluded? Right now, if I run this branch on a kurtosis peerdas network, the lighthouse server rightly responds with an error because the lighthouse client is breaking the concurrency limit
|
This is the test I used (created in another branch forked from this PR). In the test, the sender maintains two (or fewer) active requests at a time, but self rate-limiting is triggered. The test is quite simple, but I think it shows that we need to limit both concurrent requests and (optionally) tokens. What do you think? |
The spec PR has been merged. |
nice! Is this ready for another review then @ackintosh? |
Let me organize the remaining tasks of this PR 🙏 :
Note: Currently, if we run this branch (on a Kurtosis local network), the lighthouse server responds with an error as the client side is exceeding the concurrency limit, resulting in a ban. I think task |
yeah makes sense Akihito, go for it 🚀 |
fe458ac
to
2d7a679
Compare
aa57e8b
to
b73a336
Compare
https://github.com/sigp/lighthouse/actions/runs/12109786536/job/33759312718?pr=5923 The |
This is ready for another review. 🙏 I have added a concurrency limit on the self-lmiter. Now, the self-limiter limits outbound requests based on both the number of concurrent requests and tokens (optional). Whether we also need to limit tokens in the self-limiter is still under duscussion. Let me know if you have any ideas. (FYI) I also ran lighthouse (this branch) on the testnet for ~24hours. During this time, the LH node responded with 21 RateLimited errors due to the number of active requests. These errors appear in the logs like the example below. Note that this is about inbound requests, not the self-limiter (outbound requests).
|
Issue Addressed
closes #5785
Proposed Changes
The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.
Is there already an active request with the same protocol?
This check is not performed in
Before
. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.https://github.com/ethereum/consensus-specs/pull/3767/files
The PR mentions the requester side. In this PR, I introduced the
ActiveRequestsLimiter
for theresponder
side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.Request is too large?
UPDATE: I removed the
RequestSizeLimiter
and addedRPC::is_request_size_too_large()
instead, which checks if the count of requested blocks/blobs is within the number defined in the specification. That looks much simpler.discussion: #5923 (comment)
commit log: 5a9237f
This request size check is also performed inBefore
using theLimiter
, but in this PR, I introducedRequestSizeLimiter
to handle this. Unlike theLimiter
,RequestSizeLimiter
is dedicated to perform this check.The reasons why I introducedRequestSizeLimiter
are:InAfter
, the rate limiter is shared between the behaviour and the handler (Arc<Mutex<RateLimiter>>>
). (This is detailed at the next sentence)The request size check does not heavily depend on the rate-limiting logic, so it can be separated with minimal code duplication.The request size check is performed on the behaviour side. By separating the request size check from the rate limiter, we can reduce the locking of the rate limiter.Rate limit reached?
andWait until tokens are regenerated
UPDATE: I moved the limiter logic to the behaviour side. #5923 (comment)
The rate limiter is shared between the behaviour and the handler. (Arc<Mutex<RateLimiter>>>
) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol whenRPC::send_response()
is called, especially when the response isRPCCodedResponse::Error
. Therefore, the behaviour can't rate limit responses based on the response protocol.Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )Additional Info
Naming
I have renamed the fields of the behaviour to make them more intuitive:
Testing
I have run beacon node with this changes for 24hours, it looks work fine.
The rate-limited error has not occurred anymore while running this branch.