Skip to content

Commit

Permalink
guest/net: New implementation of network setup with SLAAC and own DHC…
Browse files Browse the repository at this point in the history
…P client

The existing implementation has a couple of issues:

- it doesn't support IPv6 or SLAAC

- it relies on either dhclient(8) or dhcpcd(8), which need a
  significant amount of time to configure the network as they are
  rather generic DHCP clients

- on top of this, dhcpcd, by default, unless --noarp is given, will
  spend five seconds ARP-probing the address it just received before
  configuring it

Replace the IPv4 part with a minimalistic, 90-line DHCP client that
just does what we need, using option 80 (Rapid Commit) to speed up
the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying
on the kernel to perform SLAAC. Safely avoid DAD (we're the only
node on the link) by disabling router solicitations, starting SLAAC,
and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait
until everything is configured, so that connectivity is guaranteed to
be ready before any further process runs in the guest, say:

  $ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
  PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
  64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

  --- 2a01:4f8:222:904::2 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both
IPv4 and IPv6), with the DHCP exchange and configuration taking
somewhere around 300-500 µs out of that, instead of hundreds of
milliseconds to seconds.

Configure nameservers received via DHCP option 6 as well: passt
already takes care care of translating DNS traffic directed to
loopback addresses read from resolv.conf, so we can just write those
to resolv.conf in the guest.

At least for the moment being, for simplicity, omit handling of
option 119 (domain search list), as I doubt it's going to be of much
use for muvm.

I'm not adding handling of the NDP RDNSS option (25, RFC 8106) either,
for the moment, as it involves a second netlink socket subscribing to
the RTNLGRP_ND_USEROPT group and listening to events while we receive
the first router advertisement. The equivalent userspace tool would be
rdnssd(8), which is not called before this change anyway. I would
rather add it at a later time instead of making this patch explode.

Matching support in passt for option 80 (RFC 4039) and for the DHCP
"broadcast" flag (RFC 2131) needs at least passt 2024_11_27.c0fbc7e:

  https://archives.passt.top/passt-user/20241127142126.3c53066e@elisabeth/

Signed-off-by: Stefano Brivio <[email protected]>
Co-authored-by: Teoh Han Hui <[email protected]>
  • Loading branch information
sbrivio-rh and teohhanhui committed Nov 29, 2024
1 parent d89ddbe commit 0030cc1
Show file tree
Hide file tree
Showing 2 changed files with 254 additions and 61 deletions.
4 changes: 3 additions & 1 deletion crates/muvm/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "muvm"
version = "0.1.3"
authors = ["Sergio Lopez <[email protected]>", "Teoh Han Hui <[email protected]>", "Sasha Finkelstein <[email protected]>", "Asahi Lina <[email protected]>"]
edition = "2021"
rust-version = "1.77.0"
rust-version = "1.80.0"
description = "Run programs from your system in a microVM"
repository = "https://github.com/AsahiLinux/muvm"
license = "MIT"
Expand All @@ -12,11 +12,13 @@ license = "MIT"
anyhow = { version = "1.0.82", default-features = false, features = ["std"] }
bpaf = { version = "0.9.11", default-features = false, features = [] }
byteorder = { version = "1.5.0", default-features = false, features = ["std"] }
const-str = { version = "0.5.7", default-features = false, features = [] }
env_logger = { version = "0.11.3", default-features = false, features = ["auto-color", "humantime", "unstable-kv"] }
input-linux = { version = "0.7.0", default-features = false, features = [] }
input-linux-sys = { version = "0.9.0", default-features = false, features = [] }
krun-sys = { path = "../krun-sys", version = "1.9.1", default-features = false, features = [] }
log = { version = "0.4.21", default-features = false, features = ["kv"] }
neli = { version = "0.7.0-rc2", default-features = false, features = ["sync"] }
nix = { version = "0.29.0", default-features = false, features = ["user"] }
procfs = { version = "0.17.0", default-features = false, features = [] }
rustix = { version = "0.38.34", default-features = false, features = ["fs", "mount", "process", "std", "stdio", "system", "use-libc-auxv"] }
Expand Down
311 changes: 251 additions & 60 deletions crates/muvm/src/guest/net.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,239 @@
use std::fs;
use std::io::Write;
use std::os::unix::process::ExitStatusExt as _;
use std::process::Command;
use std::net::{Ipv4Addr, UdpSocket};
use std::time::Duration;

use anyhow::{anyhow, Context, Result};
use log::debug;
use anyhow::{Context, Result};
use neli::consts::nl::NlmF;
use neli::consts::rtnl::{
Arphrd, Ifa, IfaF, Iff, RtAddrFamily, RtScope, RtTable, Rta, Rtm, RtmF, Rtn, Rtprot,
};
use neli::consts::socket::NlFamily;
use neli::nl::{NlPayload, Nlmsghdr};
use neli::router::synchronous::{NlRouter, NlRouterReceiverHandle};
use neli::rtnl::{
Ifaddrmsg, IfaddrmsgBuilder, Ifinfomsg, IfinfomsgBuilder, RtattrBuilder, Rtmsg, RtmsgBuilder,
};
use neli::types::RtBuffer;
use neli::utils::Groups;
use rustix::system::sethostname;

use crate::utils::env::find_in_path;
use crate::utils::fs::find_executable;
/// Set interface flags for eth0 (interface index 2) with a given mask
fn flags_eth0(rtnl: &NlRouter, mask: Iff, set: Iff) -> Result<()> {
let ifinfomsg = IfinfomsgBuilder::default()
.ifi_family(RtAddrFamily::Unspecified)
.ifi_type(Arphrd::Ether)
.ifi_index(2)
.ifi_change(mask)
.ifi_flags(set)
.build()?;

let _: NlRouterReceiverHandle<Rtm, Ifinfomsg> =
rtnl.send(Rtm::Newlink, NlmF::REQUEST, NlPayload::Payload(ifinfomsg))?;

Ok(())
}

/// Add or delete IPv4 routes for eth0 (interface index 2)
fn route4_eth0(rtnl: &NlRouter, what: Rtm, gw: Ipv4Addr) -> Result<()> {
let rtmsg = RtmsgBuilder::default()
.rtm_family(RtAddrFamily::Inet)
.rtm_dst_len(0)
.rtm_src_len(0)
.rtm_tos(0)
.rtm_table(RtTable::Main)
.rtm_protocol(Rtprot::Boot)
.rtm_scope(RtScope::Universe)
.rtm_type(Rtn::Unicast)
.rtm_flags(RtmF::empty())
.rtattrs(RtBuffer::from_iter([
RtattrBuilder::default()
.rta_type(Rta::Oif)
.rta_payload(2)
.build()?,
RtattrBuilder::default()
.rta_type(Rta::Dst)
.rta_payload(Ipv4Addr::UNSPECIFIED.octets().to_vec())
.build()?,
RtattrBuilder::default()
.rta_type(Rta::Gateway)
.rta_payload(gw.octets().to_vec())
.build()?,
]))
.build()?;

let _: NlRouterReceiverHandle<Rtm, Rtmsg> = rtnl.send(
what,
NlmF::CREATE | NlmF::REQUEST,
NlPayload::Payload(rtmsg),
)?;

Ok(())
}

/// Add or delete IPv4 addresses for eth0 (interface index 2)
fn addr4_eth0(rtnl: &NlRouter, what: Rtm, addr: Ipv4Addr, prefix_len: u8) -> Result<()> {
let ifaddrmsg = IfaddrmsgBuilder::default()
.ifa_family(RtAddrFamily::Inet)
.ifa_prefixlen(prefix_len)
.ifa_scope(RtScope::Universe)
.ifa_index(2)
.rtattrs(RtBuffer::from_iter([
RtattrBuilder::default()
.rta_type(Ifa::Local)
.rta_payload(addr.octets().to_vec())
.build()?,
RtattrBuilder::default()
.rta_type(Ifa::Address)
.rta_payload(addr.octets().to_vec())
.build()?,
]))
.build()?;

let _: NlRouterReceiverHandle<Rtm, Ifaddrmsg> = rtnl.send(
what,
NlmF::CREATE | NlmF::REQUEST,
NlPayload::Payload(ifaddrmsg),
)?;

Ok(())
}

/// Send DISCOVER with Rapid Commit, process ACK, configure address and route
fn do_dhcp(rtnl: &NlRouter) -> Result<()> {
// Temporary link-local address and route avoid the need for raw sockets
route4_eth0(rtnl, Rtm::Newroute, Ipv4Addr::UNSPECIFIED)?;
addr4_eth0(rtnl, Rtm::Newaddr, Ipv4Addr::new(169, 254, 1, 1), 16)?;

// Send request (DHCPDISCOVER)
let socket = UdpSocket::bind("0.0.0.0:68").expect("Failed to bind");
let mut buf = [0; 576 /* RFC 2131, Section 2 */ ];

const REQUEST: &[u8; 300 /* From RFC 951: >= 60 B of options */ ] = const_str::concat_bytes!(
1, // REQUEST
0x1, // Ethernet
6, // hlen
0, // Hops
[1, 2, 3, 4], // XID
[0, 0], // Seconds
[0x80, 0x0], // Flags
[0; 16], // All-zero (four) addresses
[0; 16], // 16B HW address: who cares
[0; 64], // 64B 'sname' (RFC 1531)
[0; 128], // 128B 'file' (RFC 1531)
[0x63, 0x82, 0x53, 0x63], // DHCP (magic) cookie
// options
[
53, 1, 1, // DISCOVER
80, 0, // Rapid commit
],
0xff, // end
[0; 54], // pad
);

socket.set_broadcast(true)?;
socket.send_to(REQUEST, "255.255.255.255:67")?;

// Keep IPv6-only fast
let _ = socket.set_read_timeout(Some(Duration::from_millis(100)));

// Get and process response (DHCPACK) if any
if let Ok((len, _)) = socket.recv_from(&mut buf) {
let msg = &mut buf[..len];

let addr = Ipv4Addr::new(msg[16], msg[17], msg[18], msg[19]);
let mut netmask = Ipv4Addr::UNSPECIFIED;
let mut router = Ipv4Addr::UNSPECIFIED;
let mut p: usize = 240;
let mut resolv = fs::File::options()
.append(true)
.open("/etc/resolv.conf")
.context("Failed to open /etc/resolv.conf")?;

while p < len {
let o = msg[p];
let l: u8 = msg[p + 1];
p += 2; // Length doesn't include code and length field itself

if o == 1 {
// Option 1: Subnet Mask
netmask = Ipv4Addr::new(msg[p], msg[p + 1], msg[p + 1], msg[p + 3]);
} else if o == 3 {
// Option 3: Router
router = Ipv4Addr::new(msg[p], msg[p + 1], msg[p + 2], msg[p + 3]);
} else if o == 6 {
// Option 6: Domain Name Server
for dns_p in (p..p + l as usize).step_by(4) {
let dns =
Ipv4Addr::new(msg[dns_p], msg[dns_p + 1], msg[dns_p + 2], msg[dns_p + 3]);
resolv
.write_all(format!("nameserver {}\n", dns).as_bytes())
.context("Failed to write to resolv.conf")?;
}
} else if o == 0xff {
// Option 255: End (of options)
break;
}

p += l as usize;
}

let prefix_len: u8 = netmask.to_bits().leading_ones() as u8;

// Drop temporary address and route, configure what we got instead
route4_eth0(rtnl, Rtm::Delroute, Ipv4Addr::UNSPECIFIED)?;
addr4_eth0(rtnl, Rtm::Deladdr, Ipv4Addr::new(169, 254, 1, 1), 16)?;

addr4_eth0(rtnl, Rtm::Newaddr, addr, prefix_len)?;
route4_eth0(rtnl, Rtm::Newroute, router)?;
} else {
// Clean up: we're clearly too cool for IPv4
route4_eth0(rtnl, Rtm::Delroute, Ipv4Addr::UNSPECIFIED)?;
addr4_eth0(rtnl, Rtm::Deladdr, Ipv4Addr::new(169, 254, 1, 1), 16)?;
}

Ok(())
}

/// Wait for SLAAC to complete or fail
fn wait_for_slaac(rtnl: &NlRouter) -> Result<()> {
let mut global_seen = false;
let mut global_wait = true;
let mut ll_seen = false;

// Busy-netlink-loop until we see a link-local address, and a global unicast
// address as long as we might expect one (see below)
while !ll_seen || (global_wait && !global_seen) {
let ifaddrmsg = IfaddrmsgBuilder::default()
.ifa_family(RtAddrFamily::Inet6)
.ifa_prefixlen(0)
.ifa_scope(RtScope::Universe)
.ifa_index(2)
.build()?;

let recv = rtnl.send(Rtm::Getaddr, NlmF::ROOT, NlPayload::Payload(ifaddrmsg))?;

for response in recv {
let header: Nlmsghdr<Rtm, Ifaddrmsg> = response?;
if let NlPayload::Payload(p) = header.nl_payload() {
if p.ifa_scope() == &RtScope::Link {
// A non-tentative link-local address implies we sent a
// router solicitation that didn't get any response
// (IPv4-only)? Stop waiting for the router in that case
if *p.ifa_flags() & IfaF::TENTATIVE != IfaF::TENTATIVE {
global_wait = false;
}

ll_seen = true;
} else if p.ifa_scope() == &RtScope::Universe {
global_seen = true;
}
}
}
}

Ok(())
}

pub fn configure_network() -> Result<()> {
// Allow unprivileged users to use ping, as most distros do by default.
Expand All @@ -33,63 +258,29 @@ pub fn configure_network() -> Result<()> {
sethostname(hostname.as_bytes()).context("Failed to set hostname")?;
}

let dhcpcd_path = find_in_path("dhcpcd").context("Failed to check existence of `dhcpcd`")?;
let dhcpcd_path = if let Some(dhcpcd_path) = dhcpcd_path {
Some(dhcpcd_path)
} else {
find_executable("/sbin/dhcpcd").context("Failed to check existence of `/sbin/dhcpcd`")?
};
if let Some(dhcpcd_path) = dhcpcd_path {
let output = Command::new(dhcpcd_path)
.args(["-M", "--nodev", "eth0"])
.output()
.context("Failed to execute `dhcpcd` as child process")?;
debug!(output:?; "dhcpcd output");
if !output.status.success() {
let err = if let Some(code) = output.status.code() {
anyhow!("`dhcpcd` process exited with status code: {code}")
} else {
anyhow!(
"`dhcpcd` process terminated by signal: {}",
output
.status
.signal()
.expect("either one of status code or signal should be set")
)
};
Err(err)?;
}
let (rtnl, _) = NlRouter::connect(NlFamily::Route, None, Groups::empty())?;
rtnl.enable_strict_checking(true)?;

return Ok(());
// Disable neighbour solicitations (dodge DAD), bring up link to start SLAAC
{
// IFF_NOARP | IFF_UP in one shot delays router solicitations, avoid it
flags_eth0(&rtnl, Iff::NOARP, Iff::NOARP)?;
flags_eth0(&rtnl, Iff::UP, Iff::UP)?;
}

let dhclient_path =
find_in_path("dhclient").context("Failed to check existence of `dhclient`")?;
let dhclient_path = if let Some(dhclient_path) = dhclient_path {
Some(dhclient_path)
} else {
find_executable("/sbin/dhclient")
.context("Failed to check existence of `/sbin/dhclient`")?
};
let dhclient_path =
dhclient_path.ok_or_else(|| anyhow!("could not find required `dhcpcd` or `dhclient`"))?;
let output = Command::new(dhclient_path)
.output()
.context("Failed to execute `dhclient` as child process")?;
debug!(output:?; "dhclient output");
if !output.status.success() {
let err = if let Some(code) = output.status.code() {
anyhow!("`dhclient` process exited with status code: {code}")
} else {
anyhow!(
"`dhclient` process terminated by signal: {}",
output
.status
.signal()
.expect("either one of status code or signal should be set")
)
};
Err(err)?;
// Configure IPv4
{
do_dhcp(&rtnl)?;
}

// Ensure IPv6 setup is done, if available
{
wait_for_slaac(&rtnl)?;
}

// Re-enable neighbour solicitations and ARP requests
{
flags_eth0(&rtnl, Iff::NOARP, Iff::empty())?;
}

Ok(())
Expand Down

0 comments on commit 0030cc1

Please sign in to comment.