Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define ping/pong payload extensions #348

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented Oct 21, 2024

This is a WIP and I am looking to get early feedback

I propose ping/pong custom_payload extensions. A typed versioned format with the goal of making Portal Ping/Pong a flexible extension and preventing protocol changes from breaking other clients when updated versions are deployed. This will also give us to time rollout updated ping/pong endpoints before deprecating older versions.

@r4f4ss
Copy link

r4f4ss commented Oct 21, 2024

Considering the goal of "prevent future protocol breaking changes, more flexible, easier less risk upgrades regarding", imagine the scenario where a ping receive a custom payload with trin/0.1.1-2b00d730/linux-x86_64/rustc1.81.0. Is it suposed the client to know wich portal network protocol version is associated with this particular trin version? This will demand a mapping. As a solution, the custom payload would return protocol specs version (e.g. pn1)(if specs is not versioned, could be).

Also, for privacy reasons, some client may decide to do not reveal its implementation/version/system, thys payload may be optional, or possibly a free string with defaut value trin/0.1.1-2b00d730/linux-x86_64/rustc1.81.0.

@KolbyML
Copy link
Member Author

KolbyML commented Oct 21, 2024

Considering the goal of "prevent future protocol breaking changes, more flexible, easier less risk upgrades regarding", imagine the scenario where a ping receive a custom payload with trin/0.1.1-2b00d730/linux-x86_64/rustc1.81.0. Is it suposed the client to know wich portal network protocol version is associated with this particular trin version? This will demand a mapping. As a solution, the custom payload would return protocol specs version (e.g. pn1)(if specs is not versioned, could be).

Also, for privacy reasons, some client may decide to do not reveal its implementation/version/system, thys payload may be optional, or possibly a free string with defaut value trin/0.1.1-2b00d730/linux-x86_64/rustc1.81.0.

I don't think the PR was read because some of the questions you asked were answered.

Second EL clients have a similar type of message no?

@r4f4ss
Copy link

r4f4ss commented Oct 21, 2024

I don't think the PR was read because some of the questions you asked were answered.

Sorry, it is in ping-payload-extensions/ping-custom-payload-extensions.md, yes, it is answered.

@r4f4ss
Copy link

r4f4ss commented Oct 22, 2024

Second EL clients have a similar type of message no?

Yes, they have node identifier (name+userIdentity+version+os+arch+compiler), and at least Geth do not allow to replace this identifier using flags, only userIdentity is configurable.

This info is available through geth RPC, just could not find wich message in devP2P protocol return this info, anyways it is public.

@KolbyML KolbyML force-pushed the ping-pong-extensions branch 3 times, most recently from 7cda182 to a9e920d Compare October 22, 2024 01:38
@KolbyML KolbyML marked this pull request as ready for review October 22, 2024 01:44
@KolbyML
Copy link
Member Author

KolbyML commented Oct 22, 2024

Second EL clients have a similar type of message no?

Yes, they have node identifier (name+userIdentity+version+os+arch+compiler), and at least Geth do not allow to replace this identifier using flags, only userIdentity is configurable.

This info is available through geth RPC, just could not find wich message in devP2P protocol return this info, anyways it is public.

I think you can request it over devp2p or maybe discv4 I am exactly sure which one it is, but that is how https://ethernodes.org/ this website works to my knowledge.

## Custom Payload Extensions Format

- **type**: what payload type is it
- **verison**: what version of the type it is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **verison**: what version of the type it is
- **version**: what version of the type it is

Comment on lines 34 to 38
CustomPayloadExtensionsFormat = Container(
type: Bytes4,
version: Bytes4,
payload: Container(inner payload is defined by type and version)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this currently is defined, it means that one has to write some custom code to only decode the encoded type and version, at which point it can then know how to decode the other Container.

Unless you meant to have the payload to be a ByteList which then could be decoded in a second step based on the type/version information after decoding the CustomPayloadExtensionsFormat Container. That would be a possible solution for this.
Or as Ping.payload is already a ByteList, another way might be to just prefix it with those bytes that indicated the type & version. Either way would work.

Comment on lines 12 to 13
type: [Type Number],
version: [Type Version],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel potentially redundant. Seems like we could simply have a type and if we end up needing to upgrade one of these types, we can use a new type byte.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that is fine

Comment on lines 21 to 23
type: [Type Number],
version: [Type Version],
payload: Container([Payload Description])
Copy link
Member

@pipermerriam pipermerriam Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so using Union is a potentially loaded topic but... It seems like we could do this whole thing in SSZ cleanly with a Union.

BasicRadiusPayload = Container(data_radius: U256)
HistoryRadiusPayload = Container(data_radius: U256, ephemeral_header_count=U16)
ClientVersion = Container(version_string: ByteList[max_length=XXX])

StateNetworkDataPayload = Union[BasicRadiusPayload, ClientVersion]
HistoryNetworkCustomDataPayload = Union[HistoryRadiusPayload, Clientversion]

What am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming this would remove the type field I added?

I am against unions because

  • they only support 127 selectors (if we have to upgrade stuff that might go quick).
  • I don't think it is easy to deprecate old Payloads, as won't they then be enshrine in the format?
  • Most teams are against unions, and most Portal clients don't handle Unions properly they manage them out of the library I think

I don't want to need to make an awkward upgrade if we run out of selectors. Also using a union would make it so parties couldn't make custom extensions which I think it important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only one of those reasons that I'm compelled by is that unions are akward to implement any kind of rolling upgrade with. Given that, I think I'd advocate for a single type field that is something like u16 and a ByteList payload that is the encoded payload.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said explicitely, it would be

Container(type: u16, payload: ByteList[max_length=XXXX])

where clients implement application specific logic for decoding the payload value into some number of known SSZ types that are supported in that network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is also what I mentioned as 1 of the two options here: #348 (comment)

And I agree that type + version is quite overkill, especially with the amount of bytes reserved for each.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the play and is aligned with what Kim said. I think a u32 would still be very nice. I might be being a little hyperbolic, but I wouldn't want to deal with the possibility of running out of types. So it will probably be fine with a u16, but if we did run out that would be a dreadful experience to deal with. But it is probably fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For errors I think a u16 is more then enough to be clear*

CustomPayloadExtensionsFormat = Container(
type: 4_294_967_295,
version: 1,
payload: Container(error_code: Bytes4, message: ByteList[MAX_ERROR_BYTE_LENGTH])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload: Container(error_code: Bytes4, message: ByteList[MAX_ERROR_BYTE_LENGTH])
payload: Container(error_code: uint16, message: ByteList[MAX_ERROR_BYTE_LENGTH])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have given some context here. I would think u16 is enough.

  • Identical payload definitions can be re-used across multiple networks.
  • We are in a byte-constrained packet so I'm inclined to be a little stingy with our bytes.
  • It's ok if completely disparate use cases end up colliding on the same type numbers since the data won't be mixed across network boundaries, which suggest that 65k is a plenty big-enough namespace for us.

Comment on lines 65 to 67
- 0: Extension not supported
- 1: Requested data not found
- 2: System error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to provide suggested error codes, I would suggest we expand this a bit and include clear language around how we suggest they be used that includes examples.

@KolbyML
Copy link
Member Author

KolbyML commented Oct 22, 2024

@kdeme @pipermerriam I implemented the suggestions. I'm ready for another look

@morph-dev
Copy link
Contributor

I might have missed it, but how is upgrade supposed to work?

Let's say we want to introduce new StateRadiusPayload that will be used instead of BasicRadiusPayload on state network. How is transition supposed to work? The only way that I see it:

  1. define new type in spec (using new integer)
  2. clients implement support for new type, but never send Ping request with it (but support replying to it)
  3. once all developers agree, we start sending Ping requests with new type (still supporting replying to old type)
  4. at some point in the future, stop supporting old type

Am I missing something? Maybe this should be documented as well?

@kdeme
Copy link
Collaborator

kdeme commented Oct 23, 2024

4.

Am I missing something? Maybe this should be documented as well?

Yes, I have the same remark as @morph-dev basically. There is currently nothing defined of how nodes would share their "capabilities" (= what they support), thus nodes cannot negotiate over which actual types/versions to use, and it would work rather on a trial&error basis, which is far from ideal.

And with the current 1,2,3,4 steps of above, I do not see much difference compared to not having this type/version system and just deciding on a specific moment in time where the payload changes. Because here you still need to do same currently (step 3).

@KolbyML
Copy link
Member Author

KolbyML commented Oct 23, 2024

@morph-dev @kdeme

Ok I added

  • A capabilities extension
  • Outlined the 3 absolutely required extensions
  • Specified something called a standard extension which requires a fork to change what a subnetwork's standard extension is, but you can call a standard extension without needing to do a trail or error or send a capabilities payload. It is always assumed to be there unless changed via hard fork.

A standard-extension is basically just an extension which is required by a subnetwork and can be assumed exists by default.

Portal implementations can add support for or use non-standard extensions without a hard fork.

So before this type system every change to custom_payloads would require a hard fork, now only changing a subnetworks standard-extensions does, which lays room for people to build extensions without requiring hardforks or requiring every implementation to support an optional feature.

I think there is value in this extension system as we can use it instead of extending or changing the Portal-Wire-Protocol. Which as a by-product makes Portal-Wire-Protocol more stable, as there shouldn't need to be a reason to change it. Or needing to do a hard fork to add a new feature to custom_payloads, unless it is update to a network's standard extension of course.

| Type number | Name | Supported sub-networks | Short Description | Is this call Required to Implement |
|---|---|---|---|---|
| [0](extensions/type-0.md) | Capabilities | All | Provides a list of enabled types | Yes |
| [1](extensions/type-1.md) | Basic Radius Payload | State, Beacon, Canonical Transaction Index, Transaction Gossip | Provides the nodes Radius | Yes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [1](extensions/type-1.md) | Basic Radius Payload | State, Beacon, Canonical Transaction Index, Transaction Gossip | Provides the nodes Radius | Yes |
| [1](extensions/type-1.md) | Basic Radius Payload | State, Beacon | Provides the nodes Radius | Yes |

Since these other networks don't exist yet, I don't think we should reference them.

Comment on lines 3 to 11
## The problem

Over time we will need to change what is sent over ping/pong `custom_payload`'s to adjust for what is needed in the future.
Currently the only way to update a `custom_payload` is through breaking changes which break protocol compatible between Portal implementations.
As we get users, deploying breaking changes to mainnet will no longer be an option as we are aiming for a 100% uptime.

## The Solution

Ping Payload Extensions. There is a minimal set of `standard extensions` which require a fork hard fork to change. This framework allows Portal clients to implement `non standard extensions` which don't require a hard fork to deploy to the network. A more flexible way to extend the Protocol without bloating the [Portal-Wire-Protocol](../portal-wire-protocol.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the "problem/solution" belongs in the spec. Once this is merged, the "problem" part is no longer applicable and I think feels awkward.

I would remove the "problem" part and change "the solution" part to be "motivation" and include the parts from the problem statement about needing a way to communicate meta data without modifying the base protocol.


## Custom Payload Extensions Format

- **type**: what payload type is it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **type**: what payload type is it
- **type**: numeric identifier which tells clients how the `payload` field should be decoded.

## Custom Payload Extensions Format

- **type**: what payload type is it
- **payload**: a ssz ByteList of max length 1100 which contents are specified the type field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **payload**: a ssz ByteList of max length 1100 which contents are specified the type field
- **payload**: the SSZ encoded extension payload

## Ping vs Pong
The relationship between Ping and Pong message will be determined by the requirements of the type.

Currently type 1,2,3 are mirrored formats, but there is room for a Ping `custom_payload` to specify details about what it wants to request, then pong handles it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently type 1,2,3 are mirrored formats, but there is room for a Ping `custom_payload` to specify details about what it wants to request, then pong handles it.
> Currently type 1,2,3 are symmetrical, having the same payload for both request and response. This symmetry is not required. Extensions may define separate payloads for Ping and Pong within the same type.

- The extension isn't a required extension for specified Portal Network.

#### 1: Requested data not found
This error code is for if an extension is asking for something and it doesn't exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This error code is for if an extension is asking for something and it doesn't exist.
This error code should be used when a client is unable to provide the necessary data for the response.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm 👍 however, I have a lot of nitpicky feedback with the wording of things.

I think I'd like to take a full editorial stab at this. It's a good base. I want to work on the wording.

@morph-dev
Copy link
Contributor

Looks good from my point of view.

@pipermerriam
Copy link
Member

I'm fine doing my cleanup post merge. It's all tweaks to the language and it'll be easier as a pull request than as PR feedback.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to move forward. I expect we should simply plan to actively revise this as it gets implemented rather than try to get it perfect from day-0.


capabilities = SSZ.serialize(List[u16, MAX_CAPABILITIES_LENGTH])

BasicRadiusPayload = Container(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named CapabilitiesPayload?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Collaborator

@kdeme kdeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally pro having a easily upgradable protocol, but two things worry me:

  1. I'd like it to be very clear to not require a hard fork. Else I don't see the point of making it complex (see my review comment at code line).
  2. I'm slightly concerned about the fact that when encountering new nodes, it will now require 6 messages (instead of 2) and possibly increase even further. Of course, a node could decide to send only the "latest Sub-network standard extension" and nothing else and it would remain the same as now. When a newer standard extension pops up, the capabilities retrieval will be needed however, but there is no way around this anyhow, even when you would do this type of flexibility on portal wire protocol level. So it would require always two messages at minimum I think (well, one could try to just launch on trial and error), assuming the client info extension is not mandatory.
    I guess what I am saying is that we should look into this well enough to make sure it is sufficiently efficient.

Comment on lines 91 to 94
### What is the process for changing a sub-network standard extension

The process of changing a Portal sub-networks standard extension is through a hard fork set by a timestamp. When the timestamp is reached usage of the old standard extension is replaced by the new standard extension. Details on what standard extensions for what networks are being replace are specified in the fork's document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the usefulness of these ping/pong payload extensions is to avoid having to do something like this in the first place.

If we don't avoid this hard-forking, then I don't really understand the point of adding this complexity.

But imo it is possible to do it without hard-forking, as is also explained in the chapter below this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we remove this section and leave the process undefined. I think we are getting into the realm of over-specifying this piece of the protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A client would never need to call capabilities to know what the current standard extension is.

The usefulness of ping extensions is not requiring all parties to agree to make a change to the protocol or add a feature.

Of course supporting the base protocol is required i.e. standard extensions, but this allows Portal clients to make optimizations such as the "welcome basket" proposal #350 Carver proposed, without requiring all implementations to support it. Which is important as we don't want new features to gate keep new clients from joining the network, but also not all clients would likely support "welcome baskets" it is a p2p optimization which isn't required by the core protocol to work. It is a bootstrapping optimization.

Ping extensions don't change the base protocol much, new clients can just return empty for capabilities and only support X networks standard extension. Ping Extensions is opt in and hence it prevents creating a situation of the protocol being "too tall to ride( or participate/implement a client)."

If a client wants a feature what is not popular or doesn't make sense for all clients to implement it they can, which I think is important long term. Especially as not only L1 will be using Portal. The base Portal P2P protocol should be simple, and Ping extensions allow us to keep it simple. Well still allowing for clients to experiment or add new features, which maybe one day become standard extensions if they become popular, but there is no requirement for that, which is one of the important parts of this proposal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is not about the usefulness of the extensions. It is about this specific section which talks about the requirement of a hard fork.

I think this should be designed in such way that a hard fork is not required at all, else I don't see the point of it, and with the capabilities extension this should be feasible.

Copy link
Member Author

@KolbyML KolbyML Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming

  1. ping's are only done by long-lived connections i.e. the other node is in your routing table? All short lived connections are done through find nodes or find content RFC/RFN etc

or

  1. ping's are sent by both long-lived and short-lived connections.

?

If 1. is true I think a forkless mechanism makes sense as the latancy cost of calling capabilities is only high if the connection is shortlived. If ping isn't used for short-lived connections, then the additional latency incurred isn't a big deal.

if 2. is true I think the fork upgrade method for standard extensions make sense as it wouldn't make sense to upgrade via capabilities unless

  • So for "close nodes" upgrading to the newest protocol or using extensions could make sense as those will be long lived connections in the routing table
  • if you are trying to request meta data, calling capabilities would make sense
  • participate in features not supported by the full node set

but if you were a short-lived ping connection, upgrading wouldn't make sense, only using the base standard extension required by the protocol would make sense

I need to put more thought into this as these are only my additional thoughts. I didn't conceive 1. could be an option till just now so I still need to spend time thinking which assumption is actually true 1. or 2.

A forkless method makes a lot of sense for devp2p as they only have long-lived connections. Since Portal is by default a short-lived protocol, having a forked upgrade method for the base standard extensions required by the protocol makes sense if 2. is true, as you wouldn't want to incur the latency for short-lived connections.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to think about it from short vs long lived connections.

However to avoid confusion I'd suggest to call it something else, because in Portal all connections are short lived.

But from reading your comment I think you mean to differentiate between:

  1. Close neighbors and/or nodes in routing table, that are thus frequently contacted for example to verify that they are alive/responsive.
  2. Nodes that are occasionally contacted when looking up nodes or content.

I think that the second group is rarely going to receive a ping from a "regular" node. When running a recursive find nodes or find content it would not require a ping.
When doing gossip it would typically hit fairly close nodes (unless it is a bridge node initiating gossip) and not use a ping (unless you launch these actively to get radius info, which fluffy does not as that would be to slow).
I'd have to further think about it in which cases a ping would still be done to these nodes.

I can see crawlers and nodes that run bridges perhaps hitting those nodes also, as they want to see the full view.

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is more thinking to be done, some of this we would only need to worry about when we propose the first breaking change so I don't think we need to pick a path immediately

The 2 paths I see are

  1. We always require capabilities is called first (on the first communication so in long term connections this would be a one time thing, assuming they are in each respective routing table) to find the latest shared version of the respective subnetworks standard extension. This path would be forkless, but it would always occur 1 additional round trip delay initially to make a standard extension ping, as there could be multiple support standard extension in this system

  2. We do a fork system, where all client implementations would always support a base standard extension for the subnetwork to work/function. This system would make it optional if an implementation wanted to call capabilities or not and hence would not have the additional RT time unless opted into.

Both have trade offs, to ask bluntly which path do you prefer?

@@ -0,0 +1,34 @@
# Get Client info

This payload MUST be available from all sub-networks. This payload allows you to figure out peer's client info. Which will be useful for censuses.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't be a MUST .

Either that or leave it as MUST but allow the client to decide on what information to put there (Including empty string).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree. We can't force clients to divulge this information and there are going to be situations where it's in their best interest to withold it. Better to allow them to not support it than to require them to give us garbage data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a response here #348 (comment)

I am fine with an empty string being a valid response for privacy focused users, but as I wrote in my post above I think it makes sense for "Get Client Info" to be a standard extension, as it will be supported likely by 99.9% of users, and other reasons I wrote in the link above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems redundant to have the same information exposed on multiple subnetworks. Not a big issue but would lead to duplicated code in implementations perhaps. Why not just add this an a JSON RPC endpoint? That why it can be exposed if needed or hidden for privacy reasons if desirable.

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems redundant to have the same information exposed on multiple subnetworks. Not a big issue but would lead to duplicated code in implementations perhaps. Why not just add this an a JSON RPC endpoint? That why it can be exposed if needed or hidden for privacy reasons if desirable.

I don't understand what is trying to be said here. I don't understand how a P2P protocol change relates to the user-facing JSON RPC. A privacy option if wanted would probably be enabled through a commandline param on startup, doesn't make much sense to make it togglable through a json-rpc.

I am not sure It seems redundant to have the same information exposed on multiple subnetworks how this is a problem, as all networks could use the same code. If users don't have to run all networks, and subnetworks are independent from each other, so if you couldn't fetch client info from all subnetworks then their would be network configurations where you wouldn't be able to ask for client info from a peer.

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that is why I suggested JSON-RPC as an alternative way for returning this information.

A JSON-RPC wouldn't change anything this PR proposes though. It wouldn't change the client info will be the same for all subnetworks so perhaps this information shouldn't be returned at the sub-network level so I am not sure I understand what is being stated here.

It seems like a UX thing for the user, but I think that would be a separate PR or a non-standard json-rpc endpoint.
JSON-RPC as an alternative way for returning this information doesn't seem like a concern that blocks this PR, more of a proposal for a simpler UX for users who wanted to ping this info from peers. I would argue this generic endpoint wouldn't be ideal for usecases like glados though, where it would make sense to use portal_*Ping for doing census's.

Would the generic endpoint have any idea of subnetworks? Or would it blindly/naively ask all of them? I would expect users to know which subnetwork they get the ENR from. The only way for them to know if X peer ran Y networks, would be blindly pinging the peer from each subnetwork. So pinging the subnetwork you got the Enr from would make the most sense in my mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portal_*Ping are JSON-RPC endpoints so why couldn't we return this information in a different JSON-RPC endpoint for doing census and then remove the client info ping message from the protocol? This would simplify the implementation. Is there another reason for using the client info details at the subnetwork level that I'm missing?

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portal_*Ping

is done at the subnetwork level.

return this information in a different JSON-RPC endpoint for doing census and then remove the client info ping message from the protocol

Adding a new JSON-RPC endpoint would still require having a ping/pong type message over the p2p protocol. A JSON-RPC endpoint only would change the interface for the user, it wouldn't remove the P2P ping/pong message.

Is there another reason for using the client info details at the subnetwork level that I'm missing?

How would you propose we do this at the Discv5 level? As that is the only level above the subnetwork level.

Copy link
Contributor

@bhartnett bhartnett Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is done at the subnetwork level.

Yes, I understand that.

Adding a new JSON-RPC endpoint would still require having a ping/pong type message over the p2p protocol. A JSON-RPC endpoint only would change the interface for the user, it wouldn't remove the P2P ping/pong message.

Perhaps I miss understand something here. If the client info message is mainly useful for audit via glados which is using JSON-RPC then why would you need client info messages in the p2p protocol? Couldn't Glados connect to each node via JSON-RPC? Or does Glados only connect to a single node via JSON-RPC? That might be the part I missed now that I think about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new JSON-RPC endpoint would still require having a ping/pong type message over the p2p protocol. A JSON-RPC endpoint only would change the interface for the user, it wouldn't remove the P2P ping/pong message.

Perhaps I miss understand something here. If the client info message is mainly useful for audit via glados which is using JSON-RPC then why would you need client info messages in the p2p protocol? Couldn't Glados connect to each node via JSON-RPC? Or does Glados only connect to a single node via JSON-RPC? That might be the part I missed now that I think about it.

Glados can only use the JSON-RPC for the Portal client it is running itself. JSON-RPC's are private interfaces to a Portal client, sometimes altruistic people will make them public for others to use.

From glados perspective this is what would happen

  • node1: call portal_*Ping to ENR X through the JSON-RPC
  • node1: ping message is sent over the subnetwork level to the PEER
  • node2: PEER sends pong back, with requested information
  • node1: receives pong and returns it from the P2P subnetwork level, to the JSON-RPC handling using a callback
  • node1: JSON-RPC returns the pong response received from the PEER to the USER

Glados only connect to a single node via JSON-RPC

So in short Glados only uses 1 Portal client's JSON-RPC. Then uses JSON-RPC requests which make Portal Wire Protocol messages at the subnetwork level.

Glados can't call another Portal clients JSON-RPC, but it can send P2P requests at the subnetwork level to other Portal clients. The JSON-RPC is the interface the user in this case uses to make P2P requests at the subnetwork level to other Portal clients.

So the idea is you can get the client info for clients/peers you don't run and hence don't have JSON-RPC access to, but you can make P2P requests to get the info through your portal clients json-rpc.

Every Portal Network Client is required to support 3 extensions
- [Type 0: Capabilities Payload](extensions/type-0.md): useful for finding what ping extensions a client supports
- Sub-network standard extension. For history this is currently [Type 2: History Radius Payload](extensions/type-2.md). For State and Beacon the base extension is [Type 1: Basic Radius Payload](extensions/type-1.md). These are extensions what are crucial. This standard extensions are specified the respective sub-networks specification
- Type 3: [Get Client info Payload](extensions/type-3.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove this from the standard set. Clients should support it but I don't think it makes sense to make it be required.

Copy link
Member Author

@KolbyML KolbyML Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to make it so a valid option is an empty string for privacy inclined users, but I don't think we should remove it from being a standard extension, as 99.9% of honest Portal nodes would support it. Client Info is important for

  • getting better statistics of our network,
    • useful for glados
    • tells us what platforms our popular and we should support better e.x. x86/arm/riscv5
  • allows us to better track what client is creating bugs on the network. Are they running an old commit? Is it a CPU architecture specific issue? etc

If this isn't a standard extension it would require

  • one ping to get capabilities
  • one ping to get client info

If 99.9% of honest clients are going to support this call, I think it should be a standard extension, with an opt out to return an empty string by the 0.001% of users which have a privacy concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with this. Will set the default path towards clients supporting this which I think is positive for network monitoring/health.

Copy link
Collaborator

@kdeme kdeme Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is fine to just be an empty string, then it is a bit silly to send this around as another message req/rep for those that want to keep that private.

Perhaps just add it as a field in the capabilities extension?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just add it as a field in the capabilities extension?

I actually like this. Removes an additional message pair and makes it so that this data gets transmitted implicitly with the other message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the PR to make it so client info is a part of the capabilities extension, this will limit the max amount of ping extensions a client can have enabled at a time to 450 or 400 instead of 500 to reserve byte space for the client enough, but 400 extensions should be more then enough so it isn't a deal breaker at the end of the day.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't think anybody would want to be writing Portal implementations if they have to deal with 400 (or even 40 for that matter) ping extensions :).

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't think anybody would want to be writing Portal implementations if they have to deal with 400 (or even 40 for that matter) ping extensions :).

The whole point is they are optional, so Portal implementation that didn't want to implement something wouldn't have to. I don't know what will happen in the future or how extensions we would be adopted. So who knows how many would end up being used.

I don't see an issue in having some head room

Non-required extension offer a way for Portal implementations to offer extended functionality to its users without requiring every Portal implementing party to agree to a new feature. This allows for a diverse set of use cases to be fulfilled without requiring every implementer implement every extension, or requiring the need to bloat the minimal [Portal-Wire-Protocol](../portal-wire-protocol.md) with new `Message Types`.

## Does implementing non-standard-extensions require a hardfork?
No only changing a sub-networks standard extension requires a hard fork.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the use of the term "hard fork" here. It collides too much with the core ethereum protocol use and in my opinion makes it sound more severe than it actually is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hard fork is just a rule change that occurs on all clients. In this case it would just be a timestamp which when reached all Portal clients would change X functionality for the base Portal Protocol.

Hardforks will be required regardless I expect in the future, there will be networking changes we make to calls other then ping/pong. To ensure a smooth not breaking change hardforks, might be a good way to do this.

Another way to do this could potentially be adding some kind of Portal Version message, where clients would always speak in the oldest version or something along those lines, but I feel that solution would be a lot more complex than hard forks. Ping/Pong is only a small part of the Portal Wire Spec which could change over time. We don't want to break interop when shipping Portal Wire Spec upgrades in the future. Ping Extensions make needing

  • hard forks less common
  • and less bloat to the core Portal Wire Spec

to name a few

If we want to use a different term then "hard fork", I am fine with that and I am open to suggestions. I struggle to think of a better understood word to describe what is happening though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is not so much about semantics, but rather about the fact that you still require to have a client aligned release mechanism, see also comment at #348 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using the term 'hard fork' doesn't sound right. Portal doesn't take part in consensus and doesn't really fork because it's not a blockchain in itself. Of course we can have breaking protocol changes that require all clients to be updated at the same time so perhaps a better term would be 'breaking protocol change' or 'non backwards compatible protocol upgrade'.

Copy link
Member Author

@KolbyML KolbyML Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using the term 'hard fork' doesn't sound right. Portal doesn't take part in consensus and doesn't really fork because it's not a blockchain in itself. Of course we can have breaking protocol changes that require all clients to be updated at the same time so perhaps a better term would be 'breaking protocol change' or 'non backwards compatible protocol upgrade'.

A hard fork is a type of update to a blockchain protocol that is not backward-compatible. This means that nodes (computers running the blockchain software) must upgrade to the new protocol to remain part of the blockchain network. Nodes that do not update will follow the older protocol, resulting in a split in the blockchain into two separate networks.

breaking protocol change or non backwards compatible protocol upgrade sounds like somebody is trying to say or describe Hardfork, without saying hardfork. A hard fork is a breaking protocol change, that could be related to consensus, or the p2p protocol. A hard fork describes a break in the road which splits a network into 2. A non backwards compatible protocol upgrade is a hardfork, as it forks the p2p network into 2 which can't talk with each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hard fork is a type of update to a blockchain protocol that is not backward-compatible. This means that nodes (computers running the blockchain software) must upgrade to the new protocol to remain part of the blockchain network. Nodes that do not update will follow the older protocol, resulting in a split in the blockchain into two separate networks.

Portal isn't exactly a blockchain protocol, it's a p2p network that contains blockchain data which follows a blockchain protocol. The term fork comes from the fact that the block history forks. If bittorrent had a breaking change, it wouldn't be called a hard fork. Anyway, its just semantics I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hard fork is a type of update to a blockchain protocol that is not backward-compatible. This means that nodes (computers running the blockchain software) must upgrade to the new protocol to remain part of the blockchain network. Nodes that do not update will follow the older protocol, resulting in a split in the blockchain into two separate networks.

Portal isn't exactly a blockchain protocol, it's a p2p network that contains blockchain data which follows a blockchain protocol. The term fork comes from the fact that the block history forks. If bittorrent had a breaking change, it wouldn't be called a hard fork. Anyway, its just semantics I guess.

I think this is very fair and it is a symantic issue, ideally their is a single word to refer to non backwards compatible protocol upgrade as it is a bit long when talking. One issue with breaking P2P changes on Portal is we don't share a protocol version with peers. Of course breaking p2p changes can happen on more then just the ping/pong wire message. Most Protocols like bittorrent would support a range of protocol versions and have a negotiation stage to find out what is the shared minimum version supported. The issue with Portal is we don't have any of that infrastructure as we don't have that type of handshake. So I do think we are constrained to do breaking changes through timestamps as Portal connections are shortlived, and Portal's P2P protocol has no idea of a version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👎 on using the term hard fork for this. It name collides with the core protocol terminology and will be confusing. I recognize that the term may be correct in definition but the name collision is a deal breaker for me.

I also will re-iterate that I don't think we should try to define how this process will work because I don't think we know and trying to define it corners us into future behavior that may not be necessary or optimal. We should leave this process undefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I updated the PR to remove the term

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding @kdeme comment about inefficiency.

I think that we can have the best of both. I would suggest that clients not open with a capabilities message, but instead just keep the same current approach and open with a request for the radius. If the client responds with an error, you can fallback to requesting their capabilities...

And realistically, I see this being an API that we are likely to use for "debug" purposes and for easier upgrades/expansion of the useful metadata about stored data that can share with each other.

@KolbyML
Copy link
Member Author

KolbyML commented Nov 21, 2024

I am generally pro having a easily upgradable protocol

I think client implementors should be able to extend Portal's P2P protocol for client/(multi-client) specific usecases without relying on all teams to come to consensus for all changes to the protocol or for new features. Of course there should be a base Portal protocol all implementations must implement. Clients shouldn't be prisoners of all implementations to add a specific features. Allowing clients to extend the Portal protocol without requiring all other parties to implement the complexity for a feature I think is high value. This already exists today in devp2p and there are a few non-standard protocols which provide a lot of value for the respective clients. For example one of the reasons Nethermind's sync is so fast is because they use a custom devp2p protocol which allows for getting more detailed information of what blocks X client has avaliable.

  1. I'd like it to be very clear to not require a hard fork. Else I don't see the point of making it complex (see my review comment at code line).

Hard forks are required for changing the standard extension for a Portal sub-network. Since it is implied by default all clients support X protocol version.

Of course, a node could decide to send only the "latest Sub-network standard extension" and nothing else and it would remain the same as now.

This would be the case for all Portal implementations which don't want to support extensions, they are intentionally optional.

When a newer standard extension pops up, the capabilities retrieval will be needed however, but there is no way around this anyhow, even when you would do this type of flexibility on portal wire protocol level

Retrieval of capabilities wouldn't be need by any clients, unless they want to access non-standard protocols. This would be opt in, which is an important aspect of this Proposal. A standard extension pops up would be changed through a hardfork, until then the extension wouldn't be a standard extension. Clients don't need to call capabilities to find out what standard extensions are supported by any clients as they are required by the protocol and changed by hardforks which happen during timestamps

So calling capabilities would never be needed unless a client specifically opt into wanting to use non-standard extensions.

So it would require always two messages at minimum I think (well, one could try to just launch on trial and error), assuming the client info extension is not mandatory.

This PR doesn't change the minimum required messages sent from what is currently done on the network. The difference is clients would now be able to optionally access other ping extensions.

I am generally pro having a easily upgradable protocol

So in short I would say easily upgradable protocol isn't even one of the major reasons why this proposal would be beneficial.

@KolbyML
Copy link
Member Author

KolbyML commented Nov 21, 2024

Regarding @kdeme comment about inefficiency.

I think that we can have the best of both. I would suggest that clients not open with a capabilities message, but instead just keep the same current approach and open with a request for the radius. If the client responds with an error, you can fallback to requesting their capabilities...

This is already the expectation so I don't think ^ is a best of both, what was wrote above is already the intended functionality. Ping extensions are meant to change nothing for implementations which don't want them. Other then needing to send 0 extensions supported, if capabilities are request

I don't think it makes sense for clients to open with Capability as it is optional at the end of the day. It isn't required for the base Portal Protocol to work. Clients might open with Capability if I manual make a JSON-RPC ping request to a new node I haven't talk to yet, but I don't expect this to be the first call in a majority of cases.

So in short I don't think there are any disagreements on this point, unless I misunderstood something

And realistically, I see this being an API that we are likely to use for "debug" purposes and for easier upgrades/expansion of the useful metadata about stored data that can share with each other.

I see ping extensions as a great way to not bloat the base Portal P2P protocol. I think Ping Extensions are a great environment to allow for early experiments with "Welcome Basket's" #350 we can test it out on trin, without requiring other Portal clients to implement it, as it is a optimization at the end of the day. If it is popular enough we can make it a standard used by all implementations. If it doesn't gain popularity it can remain a Trin specific feature which is fine, or also be supported by a subset of Portal implementations. This is one example, but I can think of a few more if needed 😁

All clients won't support all extensions and this is an intended feature. When L2's adopt Portal (it is a question of when) they can remain complaint with the base Portal Protocol (instead of making breaking changes to add new features), and instead rely on Ping extensions to add desired features, which don't make sense for the base Portal Protocol.

I think Ping Extensions are aligned with the idea of keeping the "height to ride" low for client implementers, well still allowing the flexibility for clients who want more to have more. Different teams have different resources, some features we want to support in Trin might not make sense in Samba or Shisui maybe X feature isn't a priority for them, or they don't have the resources to implement it. I don't think that should block Trin or any other team from innovating. Or we should pressure smaller teams to add features they don't need for their usecases/users

@kdeme
Copy link
Collaborator

kdeme commented Nov 21, 2024

Regarding @kdeme comment about inefficiency.

I think that we can have the best of both. I would suggest that clients not open with a capabilities message, but instead just keep the same current approach and open with a request for the radius. If the client responds with an error, you can fallback to requesting their capabilities...

And realistically, I see this being an API that we are likely to use for "debug" purposes and for easier upgrades/expansion of the useful metadata about stored data that can share with each other.

Yes, true, that could be an approach to keep the messages similar as is. Assuming the client information extension is not mandatory.

Now, I read remarks about how a client may decide to not implement this or that extension, which is true. And that it allows for quick development/special features/innovation/etc., which is all true, I don't question this. But the part that seems to be missing is that a client MUST still reply to an unsupported message, and thus in practice more messages could be transferred anyhow, whether useful or not because it is supported or not. It kind of depends on the implementation checking capabilities or just doing fire-and-forget. This is what I mean with potential inefficiency.

When comparing with devp2p additional protocols, I think the main difference is that this one is ingrained within the main protocol. This does make it more lean as it can be adjusted per sub-network, so I'm not necessarily against that. I'm just raising questions on efficiency (and the hardfork thingie), that's all.

@bhartnett
Copy link
Contributor

bhartnett commented Nov 22, 2024

In order to reduce the number of ping/pong messages for clients that support multiple of the standand and non-standard capabilities is there a way to encode a single message that has a list of types and an another list/Container of the respective type payloads? In other words, put multiple messages together. After learning the capabilities of a peer it would make sense to send a ping message containing all the message types that you and your peer both support. This is just an optimization idea I guess but not sure if it would work well in practice.

@KolbyML
Copy link
Member Author

KolbyML commented Nov 22, 2024

In order to reduce the number of ping/pong messages for clients that support multiple of the standand and non-standard capabilities is there a way to encode a single message that has a list of types and an another list/Container of the respective type payloads? In other words, put multiple messages together. After learning the capabilities of a peer it would make sense to send a ping message containing all the message types that you and your peer both support. This is just an optimization idea I guess but not sure if it would work well in practice.

This could be a new extension type in the future. Of course there is complexity in this because if you requested too many types you would need to use uTP for the transfer. I expect some ping extensions to use uTP of course, just noting it. "Welcome Baskets" would most like use uTP for example.

Since this is an optimization and there doesn't seem to be any clear use case at the moment, and it can be done simply by adding a new ping extension, I don't think this idea blocks this PR. But if it is desired in the future it can be done with new Ping extensions, so it isn't like what you are proposing wouldn't be possible, it is an idea in the future that can be done if it makes sense at some point in time.

@KolbyML
Copy link
Member Author

KolbyML commented Nov 22, 2024

Maybe a better name than ping extensions would be protocol extensions. It's just a random idea, as it is more descriptive. This PR really lays out a framework for extending the Portal Wire Spec with new functionality without requiring changing the base protocol.

changes to the base Portal Wire Spec will likely be a slower process as it will take more coordination and research to avoid bloating the spec and appealing to all client teams.

Whereas adding a protocol extension allows client teams to innovate much faster without needing to change the base Protocol. Especially as the base Protocol for Portal lacks a defined mechanism for making breaking changes or even a versioning system without absolutely destroying interop between clients. Protocol Extensions give a much more appealing way to make these changes.

Due to Portal connections being short lived, I am not sure if a versioning system makes sense, so a timestamped base fork mechanism I expect will win out in the long term, although currently there is short term push back to define anything.

When L1 and user's adopt Portal it will be a forcing function for defining this, so I am not concerned if it is defined in this PR or not, it will be defined soon enough as breaking interop between clients will then be a big no-no.

@kdeme
Copy link
Collaborator

kdeme commented Dec 2, 2024

Trying to write my remarks/concerns in a better summary:

  1. The most important reason to me for adding this type of complexity to the Portal wire protocol would be to allow for adding new versions of req/resp messages without breaking compatibility: Allowing for clients to implement and activate new message types independently of each other and allowing to phase out certain message types.
  2. When adding this extension layer at the level of ping/pong messages, if there ever is the need to change any of the other Portal protocl wire message types, there would still need to be something added for this base layer. A practical example: In the future it is likely that the uTP over discv5 gets changed to uTP on a "side stream". Maybe (not certain), this could effect the current FindContent/Content messages. We have no path forward on how to do this without losing compatibility, except for just adding a new message type and trial-and-erroring if nodes respond.
    Now the ping/pong extensions does not really allow this either. This is not fully true as there could still be a ping/pong message with capabilities that mentions the capabilities on the Portal wire protocol level. But that is not how it is intended/described right now.
  3. I also have a small fear that the ping/pong extensions would result in building up a wild-grow of extensions that in the end do not get cleaned up or do not result in efforts to standardize any new features over the different clients. But perhaps this concern is not valid.

I think 1. is not really an issue to be honest. I believe it can be done with these ping extensions.

I think 2. is not fully covered right now. But either we cover that the moment we hit such an issue. Or we could somehow allow it in the current ping/pong capabilities. It might just get a bit ugly when mixing wire protocol vs ping/pong layer capabilities. But having another capabilities message on the portal wire protocol seems to much, that's what the ping/pong payload is/was for.

I think 3. is not really a well formed reason of me to not have any extensions, but it is something that we as teams can try to avoid. One way would be to be very strict on which extension makes it in the specifications. Because once it is in the specs, it is more difficult to get it out again.

I also think that we should add something to the specs like: "Nodes MUST verify capabilities before sending out ping extensions".

@kdeme
Copy link
Collaborator

kdeme commented Dec 2, 2024

Regarding my second remark, as @KolbyML nicely pointed out, it is actually rather difficult to fix this in terms of compatibility for Portal protocol wire layer calls such as FindContent or FindNodes (and also offer to some extend). As these typically are used in a recursive way and possibly are being send to nodes not in routing table and thus not frequently contacted with a ping/pong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants