-
-
Notifications
You must be signed in to change notification settings - Fork 43
Home
The goal of this community is to develop, maintain, and improve the Multipath TCP (MPTCP) protocol (v1 / RFC 8684) in the upstream Linux kernel.
Programs that were built to use TCP will still use the TCP connections when running on an MPTCP-enabled kernel unless the programmer/user/admin opts-in to using Multipath TCP. Check the "How to use MPTCP?" section below for details on how pre-compiled or modified programs can use IPPROTO_MPTCP
sockets.
The Linux MultiPath TCP Project also has a MPTCP (v0 / RFC 6824) enabled Linux kernel available, however it was developed "out of tree" with different requirements in mind. Additional details are in the Upstream vs out-of-tree implementations section.
Here is a checklist:
- Use a "recent" kernel with MPTCP support (
grep MPTCP /boot/config-<version>
), see the ChangeLog section below.CONFIG_MPTCP
,CONFIG_MPTCP_IPV6
, andCONFIG_INET_MPTCP_DIAG
are enabled in many current Linux distributions. - Confirm that MPTCP is enabled:
sysctl net.mptcp.enabled
- Your app should create sockets with
IPPROTO_MPTCP
as the proto: (socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via themptcpize
command bundled with the mptcpd daemon. There are also additional older workarounds available(1 - 2) - Configure the routing rules to use multiple subflows from multiple interfaces: Configure Routing
- Configure the path manager using
ip mptcp
or mptcpd on both the client and server, e.g.
ip mptcp limits set subflow 2 add_addr_accepted 2
ip mptcp endpoint add <ip> dev <iface> <subflow|signal>
- Kernels v5.16 and later have default limits that will establish multiple subflows if additional network interfaces are available locally or advertised by a peer. Older kernels will default to use single subflow connections.
A small example/tutorial is available here
Please also DO NOT use multipath-tcp.org nor amiusingmptcp.de to check that you have MPTCP working: these services only support the v0 of the protocol while the upstream version only supports the v1, and the two are not compatible.
Please use tools like tcpdump
and wireshark
, or check counters with nstat
or directly in /proc/net/netstat
.
In English:
- Multipath TCP on RHEL 8: From one to many subflows (blog)
- FOSDEM 2023 (February 2023) (video)
- Linux Plumbers Conference (LPC) Networking Track (September 2022) (video)
- Netdev 0x14 (August 2020) (video)
- Multipath TCP on Red Hat Enterprise Linux 8.3, From 0 to 1 subflows (August 2020) (blog)
- Linux Plumbers Conference (LPC) Networking Track (September 2019) (video)
- Netdev 0x12 (July 2018) (video)
In Chinese (中文):
-
v5.6: Create MPTCPv1 sockets with a single subflow
- Prerequisites: modifications in TCP and Socket API
- Single subflow & RFC8684 support
- Selftests for single subflow
-
v5.7: Create multiple subflows but use them one at a time and get stats
- Multiple subflows: one subflow is used at a time
- Path management: global, controlled via Netlink
- MIB counters
- Subflow context exposed via inet_diag
- Selftests for multiple-subflow operation
- Selftests for the Netlink path manager interface
- Bug-fix
-
v5.8: Stabilisation and support more MPTCPv1 spec
- Shared receive window across multiple subflows
- A few optimisations
- Bug-fix
-
v5.9: Stabilisation and support more MPTCPv1 spec
- Token refactoring
- KUnit tests
- Receive buffer auto-tuning
- diag features (list MPTCP connections using
ss
) - Full DATA FIN support
- MPTCP SYN Cookie support
- A few optimisations
- Bug-fix
-
v5.10 (LTS): Send over multiple subflows at the same time and support more MPTCPv1 spec
- Multiple xmit: possibility to send data over multiple subflows
- ADD_ADDR support with echo-bit
- REMOVE_ADDR support
- A few optimisations
- Bug-fix
-
v5.11: Performances and support more MPTCPv1 spec
- Refines receive buffer autotuning
- Improves GRO and RX coalescing with MPTCP skbs
- Improve multiple xmit streams support
- MP_ADD_ADDR v6 support
- Sending MP_ADD_ADDR port support
- Incoming MP_FAST_CLOSE support
- A few optimisations
- Bug-fix
-
v5.12: Good performances, PM events and support more MPTCPv1 spec
- Accepting MP_JOIN to another port (after having sent an ADD_ADDR with this port) support
- MP_PRIO support
- Per connection netlink PM events
- "Delegated actions" framework to improve communications between MPTCP socket and subflows
- Support IPv4-mapped in IPv6 for additional subflows
- Performances improvement
- A few optimisations
- Bug-fix
-
v5.13: Supporting more options and items from the protocol
- Outgoing MP_FAST_CLOSE support
- MP_TCPRST support
- RM_ADDR: addresses' list support
- Switch to next available address when a subflow creation fails
- Support removing subflows with ID 0
- New MIB counters: active MPC, token creation fallback
- socket options:
- only admit explicitly supported ones
- support new ones: SO_KEEPALIVE, SO_PRIORITY, SO_RCV/SNDBUFF, SO_BINDTODEVICE/IFINDEX, SO_LINGER, SO_MARK, SO_INCOMING_CPU, SO_DEBUG, TCP_CONGESTION and TCP_INFO
- debug: new tracepoints support
- Retransmit DATA_FIN support
- MSG_TRUNC and MSG_PEEK support
- A few optimisations/cleanup
- Bug-fix
-
v5.14: Supporting more options and items from the protocol
- Checksum support
- MP_CAPABLE C flag support
- Receive path cmsg support (e.g. timestamp)
- MIB counters for invalid mapping
- A few optimisations/cleanup (that might affect perfs)
- Bug-fix
-
v5.15 (LTS): Supporting more options and usability improvements
- MP_FAIL support (without TCP fallback / infinite mapping)
- Packet scheduler improvements (especially with backup subflows)
- Full mesh path management support
- Refactoring of ADD_ADDR and ECHO handling
- Memory and execution optimization of option header transmit and receive
- Bug-fix and small optimisations
-
v5.16: Supporting more socket options
- Support for
MPTCP_INFO
socket option (similar toTCP_INFO
) - Default max additional subflows for the in-kernel PM is now set to 2
- Batch SNMP operations
- Bug-fix and optimisations
- Support for
-
v5.17: Even more socket options
- Support for new
ioctls
:SIOCINQ
,OUTQ
, andOUTQNSD
- Support for new socket options:
IP_TOS
,IP_FREEBIND
,IPV6_FREEBIND
,IP_TRANSPARENT
,IPV6_TRANSPARENT
,TCP_CORK
andTCP_NODELAY
- Support for
cmsgs
:TCP_INQ
- PM: Support changing the "backup" bit via Netlink (
ip mptcp
) - PM: Do not block subflows creation on errors
- Packet scheduler improvement with better HoL-blocking estimation improving the stability
- Support sending
MP_FASTCLOSE
option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - Bug-fix and optimisations
- Support for new
-
v5.18: Stabilisation
- Support dynamic change of the Fullmesh PM flag
- Support for new socket options:
SNDTIMEO
- Code cleanup:
- Clarify when MPTCP options can be used together
- Constify a bunch of of helpers
- Make some OPS structure Read-Only
- Add MIBs for
MP_FASTCLOSE
andMP_RST
- Add tracepoint in
mptcp_sendmsg_frag()
- Restricts RM_ADDR generation to previously explicitly announced ones
- Send ADD_ADDR echo before creating subflows
-
v5.19: Userspace control and fallbacks
- Support for MPTCP path manager in user space
- Add MPTCP support for fallback to regular TCP for connections that have never connected additional subflows or transmitted out-of-sequence data (partial support for RFC8684 fallback)
- Fallback or reset MPTCP connections in case of checksum issues (
MP_FAIL
andinfinite mapping
support) - Avoid races in MPTCP-level window tracking, stabilize and improve throughput
- Make 'ss -Ml' show MPTCP listen sockets
- BPF: Add BPF access to
mptcp_sock
structures and their meta data
-
v6.0: Initial subflow as Backup and memory optimisations
- Support changes to initial subflow priority (set the initial subflow as
backup
) - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one
- Support changes to initial subflow priority (set the initial subflow as
-
v6.1 (LTS): User namespace and TFO sender support, send MP_FASTCLOSE like TCP RST
- Allow privileged Netlink operations from user namespaces
-
TCP_FASTOPEN_CONNECT
support for a client to initiate MPTCP + TFO connections (data in the SYN). Note that the server support is still being developed -
MP_FASTCLOSE
are being sent in case of errors (equivalent to TCP RESET) and in more edge scenarios to mimic TCP behaviour
-
v6.2: TFO receiver support
- TFO receiver support
-
MSG_FASTOPEN
'ssendmsg()
flag support - Support of more socket options:
TCP_FASTOPEN
,TCP_FASTOPEN_KEY
,TCP_FASTOPEN_NO_COOKIE
- Cleaner messages in case of error when creating endpoint
- Add Path Manager "listener" Netlink events for the userspace path manager
-
v6.3: ProcFS info and mix v4/v6 subflows
- Add statistics for MPTCP sockets in use in
/proc/net/protocols
- Path-Manager: in-kernel: allow to use mixed IPv4 and IPv6 addresses
- Some clean-up and small improvements (MPTCP and selftests)
- Add statistics for MPTCP sockets in use in
-
v6.4: Improvement around the reception of connection requests
- Refactoring around the reception of MPC/MPJ connection requests
-
MPTCP_INFO
and Netlink (ss -M
): do not fill info not used by the PM in used - Move first subflow allocation at MPC access time
-
v6.5: More exposed info
- LSM/SELinux: correctly inherit labels on MPTCP subflows
- New ADD_ADDR (+ echo) transmission MIB counters
- New aggregated data counters exposed via Netlink and
getsockopt(MPTCP_INFO)
- New
getsockopt(MPTCP_FULL_INFO)
aggregating MPTCP and subflows info (with ID) - Some clean-up and small improvements (MPTCP and selftests + support of old kernels)
-
v6.6: Forcing using MPTCP with BPF
- Allow forcing using MPTCP with BPF (example)
- Refactoring to get rid of
msk->subflow
- Preparation for future extension of the packet scheduler
- Some improvements in the selftests: TAP for subtests, uniformity, colours
-
v6.7: MPTCP YNL and packet scheduler improvements
- Convert Netlink code to use YAML spec for better API validation and documentation, see YNL
- New sysctl for make after break timeout:
net.mptcp.close_timeout
- Support
SO_RCVLOWAT
socket option (instead of ignoring it) - Ignore
net.ipv4.tcp_notsent_lowat
at subflow level not to foul the packet scheduler - Reduce overhead on transmit part
- Refactor sndbuf auto-tuning to improve the situation when being limited by the send buffer
- Some clean-up in MPTCP code and selftests
-
v6.8: TODO
- New
MPTCP_INFO
and Netlink (ss -M
) counter:subflows_total
, taking into account the initial subflow (compared tosubflows
which only looks at additional subflows) - WIP
- New
- Mailing list: [email protected]
- Anyone may view the archives
- Repositories: https://github.com/multipath-tcp/mptcp_net-next.git, see the different Git Branches
- Issues are logged on Github
- IRC:
#mptcp
on irc.libera.chat. For more details about sub-channels: IRC - Patchwork
- Status (with the Roadmap)
- Testing
- Meetings
- CI
- Patch prefixes
There are two different but active Linux kernel projects related MPTCP:
- out-of-tree:
- URL: https://github.com/multipath-tcp/mptcp
- It cannot be "upstreamed" to the official Linux kernel as it: there are too many modifications in the TCP stack.
- It is designed to have very good performance with MPTCP but it has an impact on normal TCP and the maintenance is more complex.
- This version is used for the server behind http://multipath-tcp.org/
- MPTCPv0 spec is supported
- MPTCPv1 spec support is available from the v0.96 version
- Releases that are synced with older LTS kernels (v5.4 and earlier).
- upstream: (here)
- URL: https://github.com/multipath-tcp/mptcp_net-next
- Available since v5.6 in the official Linux kernel (if enabled in the kernel config)
- It is a new implementation designed to fit with upstream standards.
- The work is ongoing, please see the ChangeLog section above to see what is supported
- Only MPTCPv1 is supported
- Note: RHEL8 and later have MPTCP support based on this upstream implementation.
For the moment, there are also different versions of the protocol: RFC 6824 (MPTCPv0) and RFC 8684 (MPTCPv1).
- MPTCPv0:
- URL: https://www.rfc-editor.org/rfc/rfc6824.html
- Supported by the out-of-tree implementation but not the upstream one.
- MPTCPv1:
- URL: https://www.rfc-editor.org/rfc/rfc8684.html
- Supported by the upstream version. Supported from the v0.96 version on the out-of-tree kernel.
MPTCPv1 has significant changes that make it incompatible with v0. By design, the upstream version is not compatible with MPTCPv0. That is why curl http://multipath-tcp.org/
will always say you don't support MPTCP(v0).
Please note that MPTCPv0 and MPTCPv1 are not used to defined the different Linux kernel implementations (out-of-tree vs upstream), it is just the version of the protocol. Please use 'out-of-tree' and 'upstream' if you want to talk about the Linux kernel implementation.
$ curl http://multipath-tcp.org
Nay, Nay, Nay, your have an old computer that does not speak MPTCP. Shame on you!
Please see the above sections for more details but the server behind http://multipath-tcp.org is using the out-of-tree the implementation with MPTCPv0 only. It is then not compatible with MPTCPv1 and reporting the error.
It is planned to have a public MPTCP server with the upstream kernel but it is not ready yet.
The Docker image used by the public CIs can be used to create a basic kernel dev environment.
Download the kernel source code and then run these two commands to download the latest Docker image and run it:
$ docker pull mptcp/mptcp-upstream-virtme-docker:latest
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it mptcp/mptcp-upstream-virtme-docker:latest <manual-normal | manual-debug | auto-normal | auto-debug | auto-all>
For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Even if with MPTCP, the subflow processing is done by the TCP stack, the main difference with plain TCP is that this processing does not use the socket backlog and always happens in BH. When the host is under heavy load, BH processing happens in ksoftirqd
context, and there is some latency between the ksoftirqd
scheduling and the moment ksoftirqd
actually runs that. This depends on the process scheduler decisions (and settings).
A way to reduce these retransmissions and avoid the dropped packets at the NIC level is to increase the NIC RX queue. See issue #253 for more details.