Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want less/finer-grained locking in the datapath #435

Open
FelixMcFelix opened this issue Dec 18, 2023 · 0 comments
Open

Want less/finer-grained locking in the datapath #435

FelixMcFelix opened this issue Dec 18, 2023 · 0 comments
Labels

Comments

@FelixMcFelix
Copy link
Collaborator

Currently, individual ports are wrapped in a KMutex, which is locked on:

  1. Packet transmission
  2. Packet receipt
  3. Periodic UFT expiry
  4. Reconfiguration of user-driven network state: Firewall rules, NAT/External IPs, DHCP Params, ...

Performance-wise, what I'm most concerned about is the total shared lock between 1. and 2., although 3. has caused minor issues (#426). It would be worth spending some time thinking about how this overlap of locks can be minimised overall -- keeping in mind that we do need pairwise flow installation to both In/Out tables, shared TCP state, etc.

FelixMcFelix added a commit that referenced this issue Nov 19, 2024
This PR rewrites the core of OPTE's packet model to use zero-copy packet
parsing/modification via the
[`ingot`](oxidecomputer/ingot#1) library. This
enables a few changes which get us just shy of the 3Gbps mark.

* **[2.36 -> 2.7]** The use of ingot for modifying packets in both the
slowpath (UFT miss) and existing fastpath (UFT hit).
* Parsing is faster -- we no longer copy out all packet header bytes
onto the stack, and we do not allocate a vector to decompose an `mblk_t`
into individual links.
* Packet field modifications are applied directly to the `mblk_t` as
they happen, and field reads are made from the same source.
  * Non-encap layers are not copied back out.
* **[2.7 -> 2.75]** Packet L4 hashes are cached as part of the UFT,
speeding up multipath route selection over the underlay.
* **[2.75 -> 2.8]** Incremental Internet checksum recalculation is only
performed when applicable fields change on inner flow headers (e.g.,
NAT'd packets).
  * VM-to-VM / intra-VPC traffic is the main use case here.
* **[2.8 -> 3.05]** `NetworkParser`s now have the concept of inbound &
outbound `LightweightMeta` formats. These support the key operations
needed to execute all our UFT flows today (`FlowId` lookup, inner
headers modification, encap push/pop, cksum update).
* This also allows us to pre-serialize any bytes to be pushed in front
of a packet, speeding up `EmitSpec`.
* This is crucial for outbound traffic in particular, which has far
smaller (in `struct`-size) metadata.
* UFT misses or incompatible flows fallback to using the full metadata.
* **[3.05 -> 2.95]** TCP state tracking uses a separate per-flow lock
and does not require any lookup from a UFT.
* I do not have numbers on how large the performance loss would be if we
held the `Port` lock for the whole time.
* (Not measured) Packet/UFT L4 Hashes are used as the Geneve source
port, spreading inbound packets over NIC Rx queues based on the inner
flow.
* This is now possible because of #504 -- software classification would
have limited us to the default inbound queue/group.
* I feel bad for removing one more FF7 reference, but that is the way of
these things. RIP port `7777`.
* Previously, Rx queue affinity was derived solely from `(Src Sled, Dst
Sled)`.

There are several other changes here made to how OPTE functions which
are needed to support the zero-copy model.

* Each collected set of header transforms are `Arc<>`'d, such that we
can apply them outside of the `Port` lock.
* `FlowTable<S>`s now store `Arc<FlowEntry<S>>`, rather than
`FlowEntry<S>`.
* This enables the UFT entry for any flow to store its matched TCP flow,
update its hit count and timestamp, and then update the TCP state
without reacquring the `Port` lock.
* This also drastically simplifies TCP state handling in fast path cases
to not rely on post-transformation packets for lookup.
* `Opte::process` returns an `EmitSpec` which is needed to finalise a
packet before it can be used.
* I'm not too happy about the ergonomics, but we have this problem
because otherwise we'd need `Packet` to have some self-referential
fields when supporting other key parts of XDE (e.g., parse -> use fields
-> select port -> process).

Closes #571, closes #481, closes #460.

Slightly alleviates #435.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant