-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local node routing table separation #1568
Conversation
Before, IPv6 packets from the local-node interface using the IPv6 ULA were wrongly using the MAC address from br-client (64:70:02:ae:72:e4): client$ tcpdump -i enp0s31f6 -e -n "ether src 16:41:95:40:f7:dc or ether src 64:70:02:ae:72:e4 or ether src 8c:16:45:66:ba:11" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s31f6, link-type EN10MB (Ethernet), capture size 262144 bytes 00:02:51.202570 8c:16:45:66:ba:11 > 16:41:95:40:f7:dc, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11 > fdef:ffc0:3dd7::1: ICMP6, echo request, seq 3, length 64 00:02:51.203040 64:70:02:ae:72:e4 > 8c:16:45:66:ba:11, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7::1 > fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11: ICMP6, echo reply, seq 3, length 64 00:02:52.226566 8c:16:45:66:ba:11 > 16:41:95:40:f7:dc, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11 > fdef:ffc0:3dd7::1: ICMP6, echo request, seq 4, length 64 00:02:52.227255 64:70:02:ae:72:e4 > 8c:16:45:66:ba:11, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7::1 > fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11: ICMP6, echo reply, seq 4, length 64 (br-client: 64:70:02:ae:72:e4, local-node: 16:41:95:40:f7:dc, client device: 8c:16:45:66:ba:11) $ ip a s dev local-node 10: local-node@local-port: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 16:41:95:40:f7:dc brd ff:ff:ff:ff:ff:ff inet 10.130.0.1/20 brd 10.130.15.255 scope global local-node valid_lft forever preferred_lft forever inet6 fdef:ffc0:3dd7::1/128 scope global deprecated valid_lft forever preferred_lft 0sec inet6 fe80::1441:95ff:fe40:f7dc/64 scope link valid_lft forever preferred_lft forever With this patch applied ICMPv6 uses the correct source MAC address from the local-node interface (16:41:95:40:f7:dc): $ tcpdump -i enp0s31f6 -e -n "ether src 16:41:95:40:f7:dc or ether src 64:70:02:ae:72:e4 or ether src 8c:16:45:66:ba:11" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s31f6, link-type EN10MB (Ethernet), capture size 262144 bytes 00:15:19.757790 8c:16:45:66:ba:11 > 16:41:95:40:f7:dc, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11 > fdef:ffc0:3dd7::1: ICMP6, echo request, seq 1, length 64 00:15:19.758567 16:41:95:40:f7:dc > 33:33:ff:66:ba:11, ethertype IPv6 (0x86dd), length 86: fdef:ffc0:3dd7::1 > ff02::1:ff66:ba11: ICMP6, neighbor solicitation, who has fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11, length 32 00:15:19.758617 8c:16:45:66:ba:11 > 16:41:95:40:f7:dc, ethertype IPv6 (0x86dd), length 86: fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11 > fdef:ffc0:3dd7::1: ICMP6, neighbor advertisement, tgt is fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11, length 32 00:15:19.759220 16:41:95:40:f7:dc > 8c:16:45:66:ba:11, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7::1 > fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11: ICMP6, echo reply, seq 1, length 64 00:15:20.759425 8c:16:45:66:ba:11 > 16:41:95:40:f7:dc, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11 > fdef:ffc0:3dd7::1: ICMP6, echo request, seq 2, length 64 00:15:20.759966 16:41:95:40:f7:dc > 8c:16:45:66:ba:11, ethertype IPv6 (0x86dd), length 118: fdef:ffc0:3dd7::1 > fdef:ffc0:3dd7:0:8e16:45ff:fe66:ba11: ICMP6, echo reply, seq 2, length 64 In practice, this wrong MAC address does not seem to cause any issues so far, though, as ebtables rules are filtering to bat0 for both local-node MAC, IPv4 and IPv6 addresses and the IPv6 stack (at least in Linux) does not seem to update its neighbor table from IPv6 data packets (interestingly, contrary to the Linux IPv4 stack, as was observed with 3ef28a4 ("gluon-client-bridge: Revert "move IPv4 local subnet route to br-client (freifunk-gluon#1312)"). Signed-off-by: Linus Lüssing <[email protected]>
Just as was done for IPv6 previously, use the new, separate table for IPv4 local-node, too. This ensures that we will always only use the source MAC of the local-node interface if the packet originated from us (locally generated, not routed) and matches our local-node IPv4 or IPv6 address. Signed-off-by: Linus Lüssing <[email protected]>
This allows us to route and NAT between br-client and br-wan in the future. Which is necessary for the gluon-alt-esc-provider package, for instance. With the local-node routes separated in their own table, the issue described in: 3ef28a4 ("gluon-client-bridge: Revert "move IPv4 local subnet route to br-client (freifunk-gluon#1312)" and initially introduced in: b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (freifunk-gluon#1312)") does not apply anymore, which makes this change safe again. Signed-off-by: Linus Lüssing <[email protected]>
I am uncertain whether we should we diverge routing for babeld and batman. |
Hi @christf. When making this PR I was wondering and uncertain why those routes were in the batman-adv package upgrade scripts, too. But I did not change that because I was uncertain whether that would break something for babel. Do you mean you see some conflict with the babel integration with this concept? Then please elaborate, then I'd love to hear and learn what you are doing and need differently. Or are you just suggesting to do what I had been wondering about initially, to move these routes and tables from gluon-mesh-batman-adv (package/gluon-mesh-batman-adv/luasrc/lib/gluon/upgrade/320-gluon-mesh-batman-adv-client-bridge) to gluon-client-bridge (package/gluon-client-bridge/luasrc/lib/gluon/upgrade/310-gluon-client-bridge-local-node)? |
actually I was referring to the usage of table 2 and the fact that now batman adjusts routing rules for the local-node feature while the babel stack is untouched. I feel local-node should work the same on both stacks to avoid confusion. |
Ok, great. "local_node_route6" was already set up via 320-gluon-mesh-batman-adv-client-bridge. @christf, can I just move my changes and additions for "local_node_route", "local_node_route6", "local_node_rule", "local_node_rule6" and "br_client_route" from 320-gluon-mesh-batman-adv-client-bridge to 310-gluon-client-bridge-local-node? Or would that break something for Babel? |
I do not think it should break anything but it would be interesting to see a test. I am not sure how to best do this in the future. I could be difficult to run many test-scenarios for changes like this. |
how's the testing of this coming along? it would be nicer to get good work not only done, but also tested and afterwards maybe merged because of good results :) |
@CodeFetch i won't merge if there's not even basic testing with a babel-build. |
I try to do a test build of this PR with the help of the FFFFM babel knowledge, next week(-end) @rotanid what comes to your mind when basic testing has to be done? And what should be tested when doing tests with this PR especially? |
On layer 3 this patch should work this way:
Routing table 2: local-node interface is a veth and the packet outputs on local-port which is in the client bridge together with the wifi client interfaces and the bat0 client interface. This patch only has the purpose that packets from the local node IP address use the MAC address of the local-port interface. @bobcanthelpyou That means you have to check everything ordinary like does the clients still have gateway reachability, do they receive router announcements - say check if the clients have internet connection. Then as a client check if you can reach the status page at the local node's IPv4/IPv6 address and as a client on another node check if you can reach the status page of this node with its global IPv6 address. Edit: I'm not really sure at the moment if this can't cause troubles with babel as it uses the local-node interface for forwarding traffic between clients, too as far as I remember. Thus you also have to check if clients on two different nodes running this patch, can still reach each other. |
It is still on my todo-list but i forgot about it, and can't start at the moment (cause i am afk for the next weeks) |
@@ -23,8 +23,7 @@ uci:section('network', 'device', 'local_node_dev', { | |||
local ip4, ip6 | |||
|
|||
if next_node.ip4 then | |||
local plen = site.prefix4():match('/%d+$') | |||
ip4 = next_node.ip4 .. plen | |||
ip4 = next_node.ip4 .. '/32' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christf Please test this change with the babeld setup.
src = next_node.ip4 .. '/32', | ||
lookup = 2, | ||
}) | ||
uci:set('network', 'local_node_rule', 'in', 'loopback') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge this set into the section above (same for local_node_rule6
below).
I assume this was not done because in
is a keyword in Lua. The correct syntax for this (and arbitrary Lua values as table keys) is
{
...
['in'] = 'loopback',
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and is the in
setting even necessary at all?
uci:section('network', 'route', 'br_client_route', { | ||
interface = 'client', | ||
target = site.prefix4(), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if such a change will lead to incorrect source address selection in some cases. I'm not sure if source address selection for hosts with multiple addresses is as well-specified for IPv4 as it is for IPv6 (at least, I don't know the relevant RFCs), but I'm thinking about the following:
When the Gluon node wants to contact a local client, it will now select this route at first (before source address selection), with interface client - which doesn't have any IPv4 address. Now, how is the source address selected now? What tells Linux to use the address of local-node and not, for example, the br-wan address?
The issue might be solved by adding a "source" attribute with the next-node address to this route definition, but I haven't fully thought this through yet.
For IPv6, a similar issue cannot occur, as br-client should always have an address, which would be selected (at least when gluon-radvd is included, which is the default now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this route could lead to packets with the local-node source IPv4 address leaving through br-client, with the br-client MAC address, or if packets would go through another round of routing after choosing the source address, and then end up in routing table 2. If the former is the case, I don't think we should merge the IPv4 part of these changes...
Would adding a source address attribute with the local-node IPv4 address to this route be a problem for alt-esc?
@T-X are you still interested in this? you have not commented on the feedback which was given 9 months ago... |
Closing this PR as I'm currently not working on it. And instead have an alternative approach in mind (https://www.open-mesh.org/projects/open-mesh/wiki/OpenHarbors). |
This pull request includes one minor fix for the source MAC for IPv6 packets from the local-node ULA. And two changes similar to the previous attempt from commit b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)"). But without the drawbacks @ecsv had discovered with it (3ef28a4).
Conceptually, the new approach is to use a separate routing table for the local-node interface. Which we will only jump into if a node tries to use its IPv4 or IPv6 local-node address. This should ensure, that we never use the local-node interface's MAC address for any purpose other than direct client-to-node communication (for instance, routed packets will always use the unique MAC address from br-client).
This is also in preparation for the gluon-alt-esc packages (#1094).