-
Notifications
You must be signed in to change notification settings - Fork 32
Add new peers without restarting a node #32
base: main
Are you sure you want to change the base?
Add new peers without restarting a node #32
Conversation
Signed-off-by: Yevhenii Babichenko <[email protected]>
17747df
to
9dacf81
Compare
Signed-off-by: Yevhenii Babichenko <[email protected]>
95f303d
to
a71be6a
Compare
text/0000-add-peers-in-runtime.md
Outdated
This RFC proposes to implement the possibility to add new peers in runtime | ||
(through the component validator endpoint) when a node is working in the | ||
`static` peering mode. Along with that it also adds the corresponding extensions | ||
to the off-chain permissioning model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the corresponding permissioning text below.
One of the main security considerations is how to prevent a bad actor from taking the validator offline. Attacks may include connecting the validator to offline or otherwise byzantine nodes. Other attacks may include the general type of exposure from adding any new remote call (input formatting etc).
The proposal adds remote access to what has so far required local system administrator access to the host. We must consider adversaries who are on the same local network. The system in most cases may be configured such that the component port is not advertised beyond the loopback, but there are valid reasons for the administrator to advertise the component port on the local network. This proposal should consider how the validator remains secure independent of the system's network configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after this explanation I have a better understanding of the permissioning requirement. Will describe a model that was discussed on the chat.
text/0000-add-peers-in-runtime.md
Outdated
runtime and also decreases the uptime. | ||
|
||
To resolve this problem our team proposes to add a method to add new peers to a | ||
running node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you also considered removing nodes? The security implications are a little more risky but about the same. The benefit of including a remove function would let us address the limits of maximum-peer-connectivity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be good to include removal functionality for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a good idea.
I think this would be a useful addition to the administrative APIs. Main thing to work out are security cosiderations. |
text/0000-add-peers-in-runtime.md
Outdated
receives a new request for adding peers it: | ||
|
||
- Validates the format of peer URIs which has to be | ||
`tcp://IP_ADDRESS:PORT_NUMBER` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to IP addresses, this should also accept DNS names, which ZMQ will resolve: http://api.zeromq.org/2-1:zmq-tcp
text/0000-add-peers-in-runtime.md
Outdated
`tcp://IP_ADDRESS:PORT_NUMBER` | ||
- If the validation was successful then the validator updates its peer list and | ||
immediately returns the `OK` response. The new peers are connected _after_ | ||
that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the response is sent back before connecting, there is no way to notify clients of authorization errors. Is not sending an error due to authorization an intentional decision? Or should the error enum be extended and a response sent only after peering completes?
text/0000-add-peers-in-runtime.md
Outdated
internal interface of the application. Even if we do then we can restrict the | ||
access to that feature by using a proxy as suggested in the documentation. | ||
- Should the system notify its user about the statuses of new connections or | ||
leave the status check to the end user? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion would be to either have the response only be sent after the new connection is created, or expose another request type that allows the user to poll on the status of the new connection so they can detect connection failures such as authorization errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not sending an error due to authorization an intentional decision
I missed it intentionally to have a discussion on how we should handle such errors.
In my opinion, it would be better to extend the enum in ClientAddPeersResponse
but then I would like to discuss the implementation. The simplest approach I see here is to make the Gossip
class send notifications about peer status changes. This will allow us to know if the connection attempt was ok or not ok, but will not give any specific information. To be more specific we will have to add handlers for corresponding peer messages that will notify us, for example, about authorization violations, rejected connections and so on. I think that the simplest approach should be acceptable now (at least in the system our company build).
Signed-off-by: Yevhenii Babichenko <[email protected]>
…g that in batches. Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
Just uploaded an update. Here is what is changed:
|
text/0000-add-peers-in-runtime.md
Outdated
CLIENT_PEERS_ADD_REQUEST = 131; | ||
CLIENT_PEERS_ADD_RESPONSE = 132; | ||
CLIENT_PEERS_REMOVE_REQUEST = 131; | ||
CLIENT_PEERS_REMOVE_RESPONSE = 132; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
text/0000-add-peers-in-runtime.md
Outdated
[request-processing]: #request-processing | ||
|
||
The requests are received on the `component` endpoint. When the validator | ||
receives a new request for adding peers it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add what the validator would do to remove peers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that, and there is one question: what to do if we go below the minimum peer connectivity limit here? I have two options in my mind:
- Just allow to do that, but it my bring the node functionality down;
- Another option is to throw an error here in case we reached the minimum limit.
Wondering which of those two will be a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt this proposal only for static peering? minimum peer connectivity is used for dynamic peering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, sorry
Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
I like this proposal overall. I wonder if this truly belongs in the Client* message namespace, or whether we should have an administrative set of messages. This is similar to the question of whether it should be exposed in the REST API, but more fundamentally whether it should be able to be bound to a different port. |
I am not sure if we should define a separate namespace here — this falls beyond the scope of this proposal. I followed the simple logic here: if something does not belong to TPs or consensus engine, this should be in the Client* namespace. Do you mean by "to be bound to a different port" that those messages should be on a different port than |
text/0000-add-peers-in-runtime.md
Outdated
|
||
# Summary | ||
[summary]: #summary This RFC proposes to implement the possibility to add new | ||
peers and remove the existing connections in runtime (through the component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend: "This RFC proposes functionality to add and remove static peer connections while a validator node is running."
text/0000-add-peers-in-runtime.md
Outdated
[summary]: #summary This RFC proposes to implement the possibility to add new | ||
peers and remove the existing connections in runtime (through the component | ||
validator endpoint) when a node is working in the `static` peering mode. Along | ||
with that it also adds the corresponding extensions to the off-chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend: "This RFC also adds the corresponding extensions to the off-chain permssioning model."
text/0000-add-peers-in-runtime.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
When an administrator adds a new node to an existing Sawtooth network he/she has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comma after 'network' and just pick either he or she for the gender
text/0000-add-peers-in-runtime.md
Outdated
|
||
When an administrator adds a new node to an existing Sawtooth network he/she has | ||
to restart a node with new peering settings. This makes any automation | ||
significantly harder to write than if we had the possibility to add peers in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend: "Restarting the process adds substantial complexity to infrastructure automation, and incurs system downtime."
text/0000-add-peers-in-runtime.md
Outdated
significantly harder to write than if we had the possibility to add peers in the | ||
runtime and also decreases the uptime. | ||
|
||
To resolve this problem our team proposes to add a method to add new peers to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comma after problem
text/0000-add-peers-in-runtime.md
Outdated
list and returns the `INVALID_PEER_URI` status along with the list of faulty | ||
peer URIs. | ||
- If the `--maximum-peer-connectivity` parameter was provided to the validator | ||
then the validator checks if it has reached the maximum peer connectivity and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma after connectivity
text/0000-add-peers-in-runtime.md
Outdated
following: | ||
|
||
- Validates the format of peer URI which has to be `tcp://ADDRESS:PORT_NUMBER`; | ||
- If a peer is connected the validator removes it. Otherwise the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma after connected
text/0000-add-peers-in-runtime.md
Outdated
restrict access to requests that can be malicious. The workflow for the | ||
validation is the following: | ||
|
||
- If the `admin` role is not specified then the permissioning module will use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma after specified
text/0000-add-peers-in-runtime.md
Outdated
|
||
- If the `admin` role is not specified then the permissioning module will use | ||
the `default` policy. | ||
- If the `default` policy is not specified then the validation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma after specified
text/0000-add-peers-in-runtime.md
Outdated
verifier checks: | ||
- If the `admin_public_key` is allowed. | ||
- If the `signature` is correct. | ||
- If one of the above conditions is not satisfied then the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma after satisfied
- If the `default` or the `admin` policy is specified, then the permission | ||
verifier checks: | ||
- If the `admin_public_key` is allowed. | ||
- If the `signature` is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this? In particular, it is not clear what is being signed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the signature of the peer_uri
string (https://github.com/hyperledger/sawtooth-rfcs/pull/32/files#diff-60b9c6f8741b42854f2726db45c3f899R94).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still waiting for a reply to this message. I am not really sure whether we should use just message signatures or develop a more complicated system for future use (I believe that such system should appear as a separate and more general RFC).
Should the authorization for this feature occur during the connection conversation (see authorization.proto)? I think it may be appropriate to add ADMIN to "RoleType" and create an Admin* message namespace in addition to Client*, etc. |
Signed-off-by: Yevhenii Babichenko <[email protected]>
It may be difficult to implement the authorization in the component endpoint. Should we extend the component endpoint, use the network endpoint (it already has the authorization implemented) or create a separate admin endpoint?
Agree on that. |
- If the `default` or the `admin` policy is specified, then the permission | ||
verifier checks: | ||
- If the `admin_public_key` is allowed. | ||
- If the `signature` is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.
I think an admin endpoint would be the most secure. This would facilitate firewall rules to restrict traffic. The downside is it adds one more element to a node setup. We used to get a lot of support questions stemming from configuring the existing endpoints. That said, I don't think this will add much complexity. A loopback default wouldn't need to be modified. A simple setup would then have the administrator logging into the validator host to issue these commands and the respective private key could reside on that validator host adding no other key management burden. |
Is that OK if I reopen this pull request from another repository? |
Please dont, we dont want to loose this conversation. @eugene-babichenko |
@agunde406 The problem is I no longer have access to this fork but still want to finish this RFC. |
@eugene-babichenko Ah okay. In that case, yes it is okay to resubmit the RFC from a new fork. Please do not change commit history and link this PR in the new one. |
@agunde406 Here is the new pull request #44 |
Proposes an extension of validator client messaging to add peers to a running node without restarting it.
Signed-off-by: Yevhenii Babichenko [email protected]