Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM network hang and panic on restart (gvisor-tap-vsock/pkg/tap.(*Switch).Accept ) #1358

Closed
jlujan opened this issue Feb 10, 2023 · 5 comments
Labels
bug Something isn't working component/vz

Comments

@jlujan
Copy link

jlujan commented Feb 10, 2023

Observing loss of connectivity to a lima VM after several hours using VZ and vmnet shared network. VM fails to stop cleanly, requiring limactl stop -f. Subsequently starting the VM results in panic from gvisor-tap-socket. A second stop -f and start results in successful VM boot. Possibly related to containers/gvisor-tap-vsock#161

$ limactl start  
INFO[0000] Using the existing instance "default"        
INFO[0000] Hint: To create another instance, run the following command: limactl start --name=NAME template://default 
INFO[0000] Starting socket_vmnet daemon for "shared" network 
INFO[0000] Attempting to download the nerdctl archive from "https://github.com/containerd/nerdctl/releases/download/v1.1.0/nerdctl-full-1.1.0-linux-arm64.tar.gz"  digest="sha256:3b613a1be5a24460c44bb93a3609b790ada94e06efd1a86467d45bec7da8b449"
INFO[0000] Using cache "/Users/jlujan/Library/Caches/lima/download/by-url-sha256/c69be86b3e48430e3d687d54361cf15b90e2c067fae6a294ca6292d41f42bf0e/data" 
WARN[0000] [hostagent] Failed to detect CPU features. Assuming that AES acceleration is not available. 
INFO[0000] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/jlujan/.lima/default/serial.log") 
INFO[0000] SSH Local Port: 60022                        
INFO[0000] [hostagent] panic: runtime error: invalid memory address or nil pointer dereference 
INFO[0000] [hostagent] [signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x100c339bc] 
INFO[0000] [hostagent] goroutine 97 [running]:          
INFO[0000] [hostagent] github.com/containers/gvisor-tap-vsock/pkg/tap.(*Switch).Accept 
INFO[0000] [hostagent] 0x140000d6100, {0x101219798, 0x140000520c0}, {0x101221648, 0x140001281f8}) 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/containers/[email protected]/pkg/tap/switch.go:83 +0x4c 
INFO[0000] [hostagent] github.com/containers/gvisor-tap-vsock/pkg/virtualnetwork.(*VirtualNetwork).AcceptBess(...) 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/containers/[email protected]/pkg/virtualnetwork/bess.go:9 
INFO[0000] [hostagent] github.com/lima-vm/lima/pkg/networks.run.func1() 
INFO[0000] [hostagent] 	/private/tmp/lima-20230209-37513-1rntxz9/pkg/networks/gvisor.go:90 +0x64 
INFO[0000] [hostagent] golang.org/x/sync/errgroup.(*Group).Go.func1() 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x5c 
INFO[0000] [hostagent] created by golang.org/x/sync/errgroup.(*Group).Go 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:72 
INFO[0000] [hostagent] 0xa4                             
FATA[0000] host agent process has exited: exit status 2 

limactl version HEAD-eb49205
socket_vmnet: stable 1.1.1
M2 MacBook Air
Ventura (13.1)

lima.yaml

vmType: "vz"
cpus: 4
memory: "12GiB"

...

networks:
  - lima: shared
  - vzNAT: true

mounts:
  - location: "~"
    writable: true
mountType: "virtiofs"

containerd:
  system: false
  user: true

portForwards:
  - guestSocket: "/run/user/{{.UID}}/buildkit-default/buildkitd.sock"
    hostSocket: "{{.Dir}}/sock/buildkitd.sock"
@balajiv113
Copy link
Member

@jlujan
Yes, this is a issue mostly in lima vz driver itself not related to socket_vmnet

Mostly its same as this issue
#1200

@AkihiroSuda AkihiroSuda transferred this issue from lima-vm/socket_vmnet Feb 12, 2023
@AkihiroSuda AkihiroSuda added bug Something isn't working component/vz labels Feb 12, 2023
@AkihiroSuda
Copy link
Member

WARN[0000] [hostagent] Failed to detect CPU features. Assuming that AES acceleration is not available.

This seems a bug, although irrelevant to the topic here.

@cfergeau
Copy link

Looking at the backtrace:

INFO[0000] [hostagent] panic: runtime error: invalid memory address or nil pointer dereference 
INFO[0000] [hostagent] [signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x100c339bc] 
INFO[0000] [hostagent] goroutine 97 [running]:          
INFO[0000] [hostagent] github.com/containers/gvisor-tap-vsock/pkg/tap.(*Switch).Accept 
INFO[0000] [hostagent] 0x140000d6100, {0x101219798, 0x140000520c0}, {0x101221648, 0x140001281f8}) 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/containers/[email protected]/pkg/tap/switch.go:83 +0x4c 
INFO[0000] [hostagent] github.com/containers/gvisor-tap-vsock/pkg/virtualnetwork.(*VirtualNetwork).AcceptBess(...) 
INFO[0000] [hostagent] 	/Users/jlujan/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/containers/[email protected]/pkg/virtualnetwork/bess.go:9 
INFO[0000] [hostagent] github.com/lima-vm/lima/pkg/networks.run.func1() 
INFO[0000] [hostagent] 	/private/tmp/lima-20230209-37513-1rntxz9/pkg/networks/gvisor.go:90 +0x64 

The crash happens at https://github.com/containers/gvisor-tap-vsock/blob/5b1aff8ba74374aeb62eaa06faa496faf2c56d74/pkg/tap/switch.go#L83 which tries to print conn.RemoteAddr().String() and conn.LocalAddr().String(). conn comes from https://github.com/containers/gvisor-tap-vsock/blob/5b1aff8ba74374aeb62eaa06faa496faf2c56d74/pkg/virtualnetwork/bess.go which in turn is a connection from lima code:

return vn.AcceptBess(ctx, opts.Conn)

lima/pkg/vz/vm_darwin.go

Lines 38 to 46 in f086bc4

fileConn, err := net.FileConn(server)
if err != nil {
return nil, nil, err
}
err = machine.Start()
networks.StartGVisorNetstack(ctx, &networks.GVisorNetstackOpts{
Conn: fileConn,

If this is reproducible, might be worth taking a closer look at this conn to see if there's anything odd about it.

@AkihiroSuda AkihiroSuda changed the title VM network hang and panic on restart VM network hang and panic on restart (gvisor-tap-vsock/pkg/tap.(*Switch).Accept ) Feb 24, 2023
@balajiv113
Copy link
Member

This should not happen in current master.
Should be fixed by #1383

@AkihiroSuda
Copy link
Member

Thanks @balajiv113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/vz
Projects
None yet
Development

No branches or pull requests

4 participants