Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker daemon dies even though Desktop GUI is quite happy #12788

Closed
3 tasks done
rfay opened this issue Jun 20, 2022 · 68 comments
Closed
3 tasks done

Docker daemon dies even though Desktop GUI is quite happy #12788

rfay opened this issue Jun 20, 2022 · 68 comments

Comments

@rfay
Copy link
Contributor

rfay commented Jun 20, 2022

  • I have tried with the latest version of Docker Desktop (4.9.1 won't install, this is 4.8.2) (Update: Now running 4.10 81898, still happened)
  • I have tried disabling enabled experimental features
  • I have uploaded Diagnostics
  • Diagnostics ID: C84C1AE5-77DB-4BEF-B118-E62A312A9FD3/20220620012953

Actual behavior

While the docker UI indicates no trouble at all, the docker daemon has apparently died.

docker ps
error during connect: This error may indicate that the docker daemon is not running.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json": open //./pipe/docker_engine: The system cannot find the file specified.

Expected behavior

Daemon should be running. At the very least, the Docker Desktop UI ought to know if it's crashed.

Information

This happens regularly running DDEV tests.

  • Windows Version: 11
  • Docker Desktop Version: 4.8.2
  • WSL2 or Hyper-V backend? WSL2
  • Are you running inside a virtualized Windows e.g. on a cloud server or a VM: no

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

PS C:\Users\testbot> & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[FAIL] DD0031: does the Docker API work? error during connect: This error may indicate that the docker daemon is not running.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine_linux/v1.24/containers/json?limit=0": open //./pipe/docker_engine_linux: The system cannot find the file specified.
[FAIL] DD0004: is the Docker engine running? Get "http://ipc/docker": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:47.081880800Z][com.docker.diagnose.exe][I] ipc.NewClient: d440ba77-com.docker.diagnose -> \.\pipe\dockerLifecycleServer VMDockerdAPI
[linuxkit/pkg/desktop-host-tools/pkg/client.NewClientForPath(...)
[ linuxkit/pkg/desktop-host-tools/pkg/client/client.go:59
[linuxkit/pkg/desktop-host-tools/pkg/client.NewClient({0xc7a6df, 0x13})
[ linuxkit/pkg/desktop-host-tools/pkg/client/client.go:53 +0xa5
[common/pkg/diagkit/gather/diagnose.isDockerEngineRunning()
[ common/pkg/diagkit/gather/diagnose/dockerd.go:21 +0x29
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0x11f1760)
[ common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0x11f1760)
[ common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0xadf357?, 0x11f1760)
[ common/pkg/diagkit/gather/diagnose/run.go:140 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x2, 0x11f1760, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:146 +0x36
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0x11f17e0?, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0xcb00000012?, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkOnce(0xbba960?, 0xc0002df890)
[ common/pkg/diagkit/gather/diagnose/run.go:135 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0x11f1f60, 0xbb4300?, {0xc0002dfb20, 0x1, 0x1})
[ common/pkg/diagkit/gather/diagnose/run.go:16 +0x1cb
[main.checkCmd({0xc0000963b0?, 0xc0000963b0?, 0x4?}, {0x0, 0x0})
[ common/cmd/com.docker.diagnose/main.go:132 +0x105
[main.main()
[ common/cmd/com.docker.diagnose/main.go:98 +0x27f
[2022-06-20T01:41:47.084121200Z][com.docker.diagnose.exe][I] (d8721788) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /docker
[2022-06-20T01:41:47.084740800Z][com.docker.diagnose.exe][W] (d8721788) d440ba77-com.docker.diagnose C<-S NoResponse GET /docker (552.6µs): Get "http://ipc/docker": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:47.085305800Z][com.docker.diagnose.exe][I] (d8721788-1) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:47.085862300Z][com.docker.diagnose.exe][W] (d8721788-1) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (556.5µs): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:48.090337200Z][com.docker.diagnose.exe][I] (d8721788-2) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:48.092874600Z][com.docker.diagnose.exe][W] (d8721788-2) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (2.3749ms): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:49.097066400Z][com.docker.diagnose.exe][I] (d8721788-3) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:49.099700000Z][com.docker.diagnose.exe][W] (d8721788-3) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (2.6336ms): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:50.104720100Z][com.docker.diagnose.exe][I] (d8721788-4) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:50.104720100Z][com.docker.diagnose.exe][W] (d8721788-4) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:51.108438700Z][com.docker.diagnose.exe][I] (d8721788-5) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:51.108438700Z][com.docker.diagnose.exe][W] (d8721788-5) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (210.9µs): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:52.121449600Z][com.docker.diagnose.exe][I] (d8721788-6) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:52.121449600Z][com.docker.diagnose.exe][W] (d8721788-6) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:53.129083200Z][com.docker.diagnose.exe][I] (d8721788-7) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:53.129083200Z][com.docker.diagnose.exe][W] (d8721788-7) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.
[2022-06-20T01:41:54.130623100Z][com.docker.diagnose.exe][I] (d8721788-8) d440ba77-com.docker.diagnose C->S VMDockerdAPI GET /ping
[2022-06-20T01:41:54.130623100Z][com.docker.diagnose.exe][W] (d8721788-8) d440ba77-com.docker.diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerLifecycleServer: The system cannot find the file specified.

[FAIL] DD0011: are the LinuxKit services running? failed to ping VM diagnosticsd with error: Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:41:54.154741400Z][com.docker.diagnose.exe][I] ipc.NewClient: b85ba3df-diagnose -> \.\pipe\dockerDiagnosticd diagnosticsd
[common/pkg/diagkit/gather/diagnose.glob..func14()
[ common/pkg/diagkit/gather/diagnose/linuxkit.go:18 +0x92
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0x11f16e0)
[ common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0x11f16e0)
[ common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0xadf357?, 0x11f16e0)
[ common/pkg/diagkit/gather/diagnose/run.go:140 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x3, 0x11f16e0, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:146 +0x36
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x2, 0x11f1760?, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0x11f17e0?, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0xcb00000012?, 0xc000313730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkOnce(0xbba960?, 0xc0002df890)
[ common/pkg/diagkit/gather/diagnose/run.go:135 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0x11f1f60, 0xbb4300?, {0xc0002dfb20, 0x1, 0x1})
[ common/pkg/diagkit/gather/diagnose/run.go:16 +0x1cb
[main.checkCmd({0xc0000963b0?, 0xc0000963b0?, 0x4?}, {0x0, 0x0})
[ common/cmd/com.docker.diagnose/main.go:132 +0x105
[main.main()
[ common/cmd/com.docker.diagnose/main.go:98 +0x27f
[2022-06-20T01:41:54.159023500Z][com.docker.diagnose.exe][I] (87710474) b85ba3df-diagnose C->S diagnosticsd GET /ping
[2022-06-20T01:41:54.160233200Z][com.docker.diagnose.exe][W] (87710474) b85ba3df-diagnose C<-S NoResponse GET /ping (1.2097ms): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.

[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0001: is the application running?
[SKIP] DD0018: does the host support virtualization?
[PASS] DD0002: does the bootloader have virtualization enabled?
[PASS] DD0017: can a VM be started?
[PASS] DD0024: is WSL installed?
[PASS] DD0021: is the WSL 2 Windows Feature enabled?
[PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled?
[PASS] DD0025: are WSL distros installed?
[PASS] DD0026: is the WSL LxssManager service running?
[PASS] DD0029: is the WSL 2 Linux filesystem corrupt?
[PASS] DD0015: are the binary symlinks installed?
error during connect: This error may indicate that the docker daemon is not running.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json": open //./pipe/docker_engine: The system cannot find the file specified.
[FAIL] DD0003: is the Docker CLI working? exit status 1
[PASS] DD0013: is the $PATH ok?
[PASS] DD0005: is the user in the docker-users group?
[PASS] DD0007: is the backend responding?
[FAIL] DD0014: are the backend processes running? 3 errors occurred:
* vpnkit-bridge.exe is not running
* vpnkit.exe is not running
* com.docker.proxy.exe is not running

[PASS] DD0008: is the native API responding?
[FAIL] DD0009: is the vpnkit API responding? open \.\pipe\dockerVpnKitDiagnostics: The system cannot find the file specified.
[FAIL] DD0010: is the Docker API proxy responding? failed to ping Docker proxy API with error: Get "http://ipc/desktop-diagnostics/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:02.593502000Z][com.docker.diagnose.exe][I] ipc.NewClient: 96c811fe-diagnose -> \.\pipe\dockerDesktopLinuxEngine Proxy
[common/pkg/diagkit/gather/diagnose.glob..func12()
[ common/pkg/diagkit/gather/diagnose/ipc.go:91 +0x7e
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0x11f1c60)
[ common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0x11f1c60)
[ common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0x2?, 0x11f1c60)
[ common/pkg/diagkit/gather/diagnose/run.go:140 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0x11f1c60, 0xc000095730)
[ common/pkg/diagkit/gather/diagnose/run.go:146 +0x36
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0xcb00000012?, 0xc000095730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkOnce(0xbba960?, 0xc0002df890)
[ common/pkg/diagkit/gather/diagnose/run.go:135 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0x11f1f60, 0xbb4300?, {0xc0002dfb20, 0x1, 0x1})
[ common/pkg/diagkit/gather/diagnose/run.go:16 +0x1cb
[main.checkCmd({0xc0000963b0?, 0xc0000963b0?, 0x4?}, {0x0, 0x0})
[ common/cmd/com.docker.diagnose/main.go:132 +0x105
[main.main()
[ common/cmd/com.docker.diagnose/main.go:98 +0x27f
[2022-06-20T01:42:02.593502000Z][com.docker.diagnose.exe][I] (71cdd17e) 96c811fe-diagnose C->S Proxy GET /desktop-diagnostics/ping
[2022-06-20T01:42:02.593502000Z][com.docker.diagnose.exe][W] (71cdd17e) 96c811fe-diagnose C<-S NoResponse GET /desktop-diagnostics/ping (0s): Get "http://ipc/desktop-diagnostics/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:02.593502000Z][com.docker.diagnose.exe][I] (71cdd17e-1) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:02.593502000Z][com.docker.diagnose.exe][W] (71cdd17e-1) 96c811fe-diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:03.602791300Z][com.docker.diagnose.exe][I] (71cdd17e-2) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:03.604839400Z][com.docker.diagnose.exe][W] (71cdd17e-2) 96c811fe-diagnose C<-S NoResponse GET /ping (2.0481ms): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:04.611441600Z][com.docker.diagnose.exe][I] (71cdd17e-3) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:04.613660300Z][com.docker.diagnose.exe][W] (71cdd17e-3) 96c811fe-diagnose C<-S NoResponse GET /ping (2.0321ms): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:05.618917500Z][com.docker.diagnose.exe][I] (71cdd17e-4) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:05.619976600Z][com.docker.diagnose.exe][W] (71cdd17e-4) 96c811fe-diagnose C<-S NoResponse GET /ping (1.0591ms): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:06.627059000Z][com.docker.diagnose.exe][I] (71cdd17e-5) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:06.630116900Z][com.docker.diagnose.exe][W] (71cdd17e-5) 96c811fe-diagnose C<-S NoResponse GET /ping (3.0579ms): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:07.640784100Z][com.docker.diagnose.exe][I] (71cdd17e-6) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:07.640784100Z][com.docker.diagnose.exe][W] (71cdd17e-6) 96c811fe-diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:08.652906700Z][com.docker.diagnose.exe][I] (71cdd17e-7) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:08.652906700Z][com.docker.diagnose.exe][W] (71cdd17e-7) 96c811fe-diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.
[2022-06-20T01:42:09.660179900Z][com.docker.diagnose.exe][I] (71cdd17e-8) 96c811fe-diagnose C->S Proxy GET /ping
[2022-06-20T01:42:09.660179900Z][com.docker.diagnose.exe][W] (71cdd17e-8) 96c811fe-diagnose C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDesktopLinuxEngine: The system cannot find the file specified.

[PASS] DD0006: is the Docker Desktop Service responding?
[FAIL] DD0012: is the VM networking working? network checks failed: Post "http://ipc/check-network-connectivity": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:09.802844100Z][com.docker.diagnose.exe][I] ipc.NewClient: 4d0b8c5f-diagnose-network -> \.\pipe\dockerDiagnosticd diagnosticsd
[common/pkg/diagkit/gather/diagnose.runIsVMNetworkingOK()
[ common/pkg/diagkit/gather/diagnose/network.go:34 +0xdd
[common/pkg/diagkit/gather/diagnose.(*test).GetResult(0x11f1960)
[ common/pkg/diagkit/gather/diagnose/test.go:46 +0x43
[common/pkg/diagkit/gather/diagnose.Run.func1(0x11f1960)
[ common/pkg/diagkit/gather/diagnose/run.go:17 +0x5a
[common/pkg/diagkit/gather/diagnose.walkOnce.func1(0x2?, 0x11f1960)
[ common/pkg/diagkit/gather/diagnose/run.go:140 +0x77
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x1, 0x11f1960, 0xc000095730)
[ common/pkg/diagkit/gather/diagnose/run.go:146 +0x36
[common/pkg/diagkit/gather/diagnose.walkDepthFirst(0x0, 0xcb00000012?, 0xc000095730)
[ common/pkg/diagkit/gather/diagnose/run.go:149 +0x73
[common/pkg/diagkit/gather/diagnose.walkOnce(0xbba960?, 0xc0002df890)
[ common/pkg/diagkit/gather/diagnose/run.go:135 +0xcc
[common/pkg/diagkit/gather/diagnose.Run(0x11f1f60, 0xbb4300?, {0xc0002dfb20, 0x1, 0x1})
[ common/pkg/diagkit/gather/diagnose/run.go:16 +0x1cb
[main.checkCmd({0xc0000963b0?, 0xc0000963b0?, 0x4?}, {0x0, 0x0})
[ common/cmd/com.docker.diagnose/main.go:132 +0x105
[main.main()
[ common/cmd/com.docker.diagnose/main.go:98 +0x27f
[2022-06-20T01:42:09.803351900Z][com.docker.diagnose.exe][I] (9ce62c75) 4d0b8c5f-diagnose-network C->S diagnosticsd POST /check-network-connectivity: {"ips":["192.168.1.104","172.18.208.1"]}
[2022-06-20T01:42:09.804994200Z][com.docker.diagnose.exe][W] (9ce62c75) 4d0b8c5f-diagnose-network C<-S NoResponse POST /check-network-connectivity (1.6423ms): Post "http://ipc/check-network-connectivity": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:09.806078400Z][com.docker.diagnose.exe][I] (9ce62c75-1) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:09.806615200Z][com.docker.diagnose.exe][W] (9ce62c75-1) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (536.8µs): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:10.815722900Z][com.docker.diagnose.exe][I] (9ce62c75-2) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:10.818005500Z][com.docker.diagnose.exe][W] (9ce62c75-2) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (2.2826ms): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:11.825475900Z][com.docker.diagnose.exe][I] (9ce62c75-3) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:11.828027800Z][com.docker.diagnose.exe][W] (9ce62c75-3) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (2.5519ms): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:12.835225800Z][com.docker.diagnose.exe][I] (9ce62c75-4) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:12.836392900Z][com.docker.diagnose.exe][W] (9ce62c75-4) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (1.1671ms): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:13.849572300Z][com.docker.diagnose.exe][I] (9ce62c75-5) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:13.849572300Z][com.docker.diagnose.exe][W] (9ce62c75-5) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:14.859943700Z][com.docker.diagnose.exe][I] (9ce62c75-6) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:14.859943700Z][com.docker.diagnose.exe][W] (9ce62c75-6) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:15.863922000Z][com.docker.diagnose.exe][I] (9ce62c75-7) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:15.863922000Z][com.docker.diagnose.exe][W] (9ce62c75-7) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.
[2022-06-20T01:42:16.879871100Z][com.docker.diagnose.exe][I] (9ce62c75-8) 4d0b8c5f-diagnose-network C->S diagnosticsd GET /ping
[2022-06-20T01:42:16.879871100Z][com.docker.diagnose.exe][W] (9ce62c75-8) 4d0b8c5f-diagnose-network C<-S NoResponse GET /ping (0s): Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.

[FAIL] DD0032: do Docker networks overlap with host IPs? error during connect: This error may indicate that the docker daemon is not running.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine_linux/v1.24/networks": open //./pipe/docker_engine_linux: The system cannot find the file specified.
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?

Please investigate the following 3 issues:

1 : The test: are the LinuxKit services running?
Failed with: failed to ping VM diagnosticsd with error: Get "http://ipc/ping": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.

The Docker engine runs inside a Linux VM as a service. Therefore the services must have started.

2 : The test: are the backend processes running?
Failed with: 3 errors occurred:
* vpnkit-bridge.exe is not running
* vpnkit.exe is not running
* com.docker.proxy.exe is not running

Not all of the backend processes are running.

3 : The test: is the VM networking working?
Failed with: network checks failed: Post "http://ipc/check-network-connectivity": open \.\pipe\dockerDiagnosticd: The system cannot find the file specified.

VM seems to have a network connectivity issue. Please check your host firewall and anti-virus settings in case they are blocking the VM.

PS C:\Users\testbot>

Steps to reproduce the behavior

  1. ...
  2. ...
@StefanScherer
Copy link
Member

Does this still happen with the 4.10 build?

@rfay
Copy link
Contributor Author

rfay commented Jun 29, 2022

I haven't seen this in the last couple of days now that this machine is running the internal build 81898

It's particularly annoying because the Desktop UI shows all is green, but it's wrong. (It does seem that the Desktop UI doesn't refresh things quite right lots of the time. It often shows the wrong upgrade version (not the current one) and there's no way to refresh, etc.)

@rfay
Copy link
Contributor Author

rfay commented Jun 30, 2022

Yes, @StefanScherer I just had this happen with a Win11 machine running the new 4.10 build 81898, diagnostics C84C1AE5-77DB-4BEF-B118-E62A312A9FD3/20220629235937

The Desktop UI had no idea that the daemon had died.

@austindimmer
Copy link

I am observing a similar issue. It has been annoying me for a few months now. When I observed it earlier today I finally decided I need to investigate more and it lead me to this issue.

I use VSCode Remote Containers (devcontainers) to setup consistent development environments so I need this to be more stable, otherwise the user experience is really subpar and I find it difficult to create buy-in for container based workflows on the team.

The current workaround for me is to run the following PowerShell script when this happens to try and spin everything back up again.

$processes = Get-Process "*docker desktop*"
if ($processes.Count -gt 0)
{
    $processes[0].Kill()
    $processes[0].WaitForExit()
}

wsl.exe --shutdown

Start-Process "C:\Program Files\Docker\Docker\Docker Desktop.exe"

The above command is far from reliable but it usually works. It is a 5 min disruption to the dev workflow and really throws me out of the "Zone". Sometimes the VSCode Remote Container will re-connect after running this, othertimes I have to completely rebuild the devcontainer. Major pain.

My system specs are:

Docker Engine v20.10.16
Docker Desktop 4.9.1 (81317)

Processor AMD Ryzen Threadripper PRO 3995WX 64-Cores 2.70 GHz
Installed RAM 256 GB (256 GB usable)
System type 64-bit operating system, x64-based processor

Edition Windows 11 Enterprise
Version 21H2
OS build 22000.739
Experience Windows Feature Experience Pack 1000.22000.739.0

If I run the diagnostics CLI as follows:

& "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

I observe the following output

Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0031: does the Docker API work?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0001: is the application running?
[SKIP] DD0018: does the host support virtualization?
[PASS] DD0002: does the bootloader have virtualization enabled?
[PASS] DD0017: can a VM be started?
[PASS] DD0024: is WSL installed?
[PASS] DD0021: is the WSL 2 Windows Feature enabled?
[PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled?
[PASS] DD0025: are WSL distros installed?
[PASS] DD0026: is the WSL LxssManager service running?
[PASS] DD0029: is the WSL 2 Linux filesystem corrupt?
[PASS] DD0035: is the VM time synchronized?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0013: is the $PATH ok?
[FAIL] DD0005: is the user in the docker-users group? The user name could not be found.
[PASS] DD0007: is the backend responding?
[PASS] DD0014: are the backend processes running?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[PASS] DD0006: is the Docker Desktop Service responding?
[PASS] DD0012: is the VM networking working?
[PASS] DD0032: do Docker networks overlap with host IPs?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?

Please investigate the following 1 issue:

1 : The test: is the user in the docker-users group?
    Failed with: The user name could not be found.

The current user must be member of the docker-users group. Press the Win + R keys to open Run, type lusrmgr.msc into Run, followed by Enter to open Local Users and Groups.

I have verified that my current user is in fact a member of the docker-users group so that fail report is wrong.

When I run diagnostics directly from the Docker Desktop UI I observe "Diagnose Succeeded"

Diagnostics ID 6B0365DE-F4FA-4E14-9FF9-C9BD27589B0F/20220702132903

Please note these diagnostics were not taken when the demon was in a crashed state. The next time I observe the crash I will try to obtain a diagnostics report and provide it here.

I did not observe this issue when running on an Intel system with a similar setup. I am not sure the the processor might me part of the root cause.

@rfay
Copy link
Contributor Author

rfay commented Jul 3, 2022

This happened twice on two different machines running 4.10 today:

  • D19B37DE-ACAC-4E0B-B6CC-B5097FDCB9AA/20220703211428
  • C84C1AE5-77DB-4BEF-B118-E62A312A9FD3/20220703211650

@djs55
Copy link

djs55 commented Jul 5, 2022

Looking in the logs of D19B37DE-ACAC-4E0B-B6CC-B5097FDCB9AA/20220703211428 it seems everything starts disconnecting from Linux around here:

[2022-07-03T15:01:49.519390100Z][com.docker.proxy.exe][I] proxy << GET /containers/1225e677398e1f173fda39697c0bb971128c734a55f24156fccd4c13280c812a/json? (23.2736ms)
[2022-07-03T15:01:49.930045900Z][VpnKitBridge      ][Info   ] msg="disconnected data connection: multiplexer is offline"
[2022-07-03T15:01:49.930045900Z][VpnKitBridge      ][Info   ] msg="error CloseWrite to: file has already been closed"
[2022-07-03T15:01:49.930045900Z][vpnkit-bridge.exe][I] windows: docker: context cancelled, closing listener
[2022-07-03T15:01:49.930045900Z][vpnkit-bridge.exe][I] windows: volume-contents: context cancelled, closing listener
[2022-07-03T15:01:49.930045900Z][vpnkit-bridge.exe][I] windows: lifecycle-server: context cancelled, closing listener
[2022-07-03T15:01:49.941094700Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941094700Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941094700Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941609200Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941609200Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941609200Z][VpnKitBridge      ][Info   ] msg="error copying: file has already been closed"
[2022-07-03T15:01:49.941609621Z][com.docker.vpnkit.exe][info] vmnet: Vmnet.Server.listen: read EOF so closing connection
[2022-07-03T15:01:49.941609621Z][com.docker.vpnkit.exe][info] vmnet: Vmnet.Server.disconnect
[2022-07-03T15:01:49.941609621Z][com.docker.vpnkit.exe][info] vmnet: Vmnet.Server.listen returning Ok()
[2022-07-03T15:01:49.941609621Z][com.docker.vpnkit.exe][info] main: TCP/IP stack disconnected
[2022-07-03T15:01:49.941609200Z][WslKeepAlive      ][Info   ] wsl keep-alive stopped
[2022-07-03T15:01:49.941609200Z][WslKeepAlive      ][Warning] stopped unexpectedly
[2022-07-03T15:01:49.945832000Z][com.docker.proxy.exe][W] streaming response body from Docker: unexpected EOF
[2022-07-03T15:01:49.949485000Z][com.docker.proxy.exe][W] streaming response body from Docker: unexpected EOF

My main concern is the wsl keep-alive stopped because this is a simple process which runs forever. If it stops it suggests that something killed it. Perhaps a Linux kernel crash or OOM event. I don't think we have enough in the diagnostics to check that definitively.

There's a similar error in the second diagnostic

[2022-07-03T13:42:27.153950600Z][WslKeepAlive      ][Info   ] wsl keep-alive stopped
[2022-07-03T13:42:27.153950600Z][WslKeepAlive      ][Warning] stopped unexpectedly
[2022-07-03T13:42:27.173044600Z][com.docker.wsl-distro-proxy.exe for Ubuntu][W] docker-desktop-user-distro proxy has exited with an error: exit status 1

@rfay
Copy link
Contributor Author

rfay commented Jul 5, 2022

Thanks for looking at this @djs55 - great to see you around.

I would doubt it was a OOM, since this is a 16GB machine, and these tests run just one project at a time, but it's a reasonable theory. I would think you'd see "killed" in there though, as an OOM kill on Linux usually shows that.

This happens regularly, and of course these machines get heavy usage for testing. A single test runs about 2 hours of continue start-and-stop of docker stuff. Normally only one project at a time though, not a lot of memory involved.

@rfay
Copy link
Contributor Author

rfay commented Jul 5, 2022

@djs55 if you want to roll a version with more instrumentation, or memory tracking, or whatever, I'll be happy to deploy it.

@rfay
Copy link
Contributor Author

rfay commented Jul 16, 2022

This continues consistently. I'm sure most folks just see it and restart docker... but it's a major reliability problem.

FD6191C0-1AD2-4F0C-B6E5-756BEA0144A9/20220716212756

@rfay
Copy link
Contributor Author

rfay commented Jul 22, 2022

This happens daily. I'll be happy to make a broken system available to you to study.

@rfay
Copy link
Contributor Author

rfay commented Aug 2, 2022

I have to restart docker desktop on multiple machines some days.

When this gets attention, maybe a better set of error messages could be introduced as well.

error during connect: In the default daemon configuration on Windows, the docker client must be run with elevated privileges to connect.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22com.docker.compose.project%3Dddev-testpkgdrupal9%22%3Atrue%7D%7D&limit=0": open //./pipe/docker_engine: The system cannot find the file specified.'
 

  • "In the default daemon configuration on Windows, the docker client must be run with elevated privileges to connect" - Obviously the problem here has nothing to do with elevated privileges, it has to do with the docker daemon not running. Could we do better with that?
  • "open //./pipe/docker_engine: The system cannot find the file specified" So many people don't understand this error message, meaning the daemon is gone, or docker desktop is not running. We've seen perhaps dozens of error messages over the years about this and not understanding it. Maybe it could just say "docker desktop isn't running or is broken".

@rfay
Copy link
Contributor Author

rfay commented Aug 2, 2022

It would also be lovely to have a way to restart docker desktop from the command line, as many of us have asked for a really long time.

@rfay
Copy link
Contributor Author

rfay commented Aug 9, 2022

I have to restart Docker every day on more than one Windows test runner. I'll ping in slack to see if anybody could take a look. Again, I can make a crashed system available to you if you'd like.

@rfay
Copy link
Contributor Author

rfay commented Aug 22, 2022

This doesn't just happen on Windows; as one would expect, it also happens with WSL2 client; same exact behavior.

@p1-0tr
Copy link
Member

p1-0tr commented Aug 25, 2022

hi @rfay, We are re-working our WSL2 engine. It is still not ready for sharing, but I'll post a test build once that is closer to being ready for use in anger.

In the meantime, based on logs, I can see that the app is made aware of the failure, and it should produces notification (with a prompt to restart the WSL integration), but for some reason it does not get reflected in the Dashboard. So I'll work on getting out a fix to at least make the UI state indicate that the engine is down in such cases.

@rfay
Copy link
Contributor Author

rfay commented Aug 25, 2022

Thanks @p1-0tr - but...

  • This doesn't have anything to do with WSL integration. The logs I've provided here are about using Docker on traditional Windows. However, I've seen it happen from the WSL integration side as well. Same thing, you can't get to the docker engine from anywhere, because the daemon has died apparently.
  • If you just would restart the daemon when it fails, that would be something! Or notice when it fails and at least have the Docker Desktop not be all green and happy. But best, either don't crash or restart it when it does.

@rfay
Copy link
Contributor Author

rfay commented Sep 4, 2022

I have to restart Docker Desktop on multiple machines most days because the daemon inexplicably becomes unavailable.

@rfay
Copy link
Contributor Author

rfay commented Sep 17, 2022

Daily.

@rfay
Copy link
Contributor Author

rfay commented Oct 7, 2022

This remains a daily problem on Windows. I see in the logs

Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22com.docker.compose.oneoff%3DFalse%22%3Atrue%2C%22com.docker.compose.project%3Dddev-testddevxdebugenabled15814129%22%3Atrue%7D%7D&limit=0": open //./pipe/docker_engine: The system cannot find the file specified.'

and then I go do a restart in Docker desktop and all is well. When I get to Docker Desktop everything is green, so it's not actually paying attention.

@rfay
Copy link
Contributor Author

rfay commented Oct 27, 2022

Just fixed two broken test runners.

@rfay
Copy link
Contributor Author

rfay commented Nov 11, 2022

Daily restarts required. This affects me and probably every other Docker Desktop user. When you get back to this I'll be happy to demo for you or give you access to a broken system.

@rfay
Copy link
Contributor Author

rfay commented Dec 1, 2022

How long do I have to keep commenting to keep this issue alive? It's a standard problem, I will give you access to a broken system when you want.

@rfay
Copy link
Contributor Author

rfay commented Dec 30, 2022

Is there any way to automatically restart the docker daemon other than "restart" in the gui? I could restart before every test.

@rfay
Copy link
Contributor Author

rfay commented Jan 13, 2023

I am wondering if this is no longer happening daily. Now have monitoring of docker on these systems so this failure gets reported immediately; of course it's possible that the monitoring somehow keeps DD lively and prevents the fail.

@rfay
Copy link
Contributor Author

rfay commented Jan 19, 2023

Just a note that this continues to happen. When it happens the Docker Desktop service still shows as running in Services.msc, and restarting the service does not help; doing a "restart" from the Docker Desktop UI does solve it.

@MihaelaStoica
Copy link

@rfay, is this still happening with the latest version of Docker Desktop, 4.16.2? We had done some work on detecting and propagating errors in this version, but if you still encounter the issue, a fresh diagnostic would be useful.

@rfay
Copy link
Contributor Author

rfay commented Jan 20, 2023

You know I'll let you know @MihaelaStoica , thanks!

@rfay
Copy link
Contributor Author

rfay commented Jan 22, 2023

@MihaelaStoica here's one for you with 4.16.2, 714965AA-305C-4811-B395-B1F87D1BCF18/20230122041134 - as usual, the UI shows everything green, docker ps fails, UI restart fixes.

@rfay
Copy link
Contributor Author

rfay commented Jan 23, 2023

Another, 714965AA-305C-4811-B395-B1F87D1BCF18/20230123182624

PS C:\Users\testbot> docker ps Error response from daemon: open \\.\pipe\docker_engine_linux: The system cannot find the file specified.

@rfay
Copy link
Contributor Author

rfay commented Mar 21, 2023

No, sorry. 2 of these are doing wsl2 testing and for the ones doing traditional windows it doesn't make sense to test non-default. You and we want people using wsl2.

@doringeman
Copy link

Alright. Will continue to investigate why the error is not propagated to the UI and get back to you when I have updates.

@rfay
Copy link
Contributor Author

rfay commented Mar 29, 2023

I saw that you added additional instrumentation to 4.17.1 (Windows) so here's a failure using 4.17.1. This started about 1:30 before the time of the diagnostics. It's now 14:11 here and the failure started about 12:42

B2E22B4E-03C8-42FB-86CB-D31E65C1B995/20230329200534

@rfay
Copy link
Contributor Author

rfay commented Apr 26, 2023

It's possible that things have changed in 4.18.0. Now I may be seeing slightly fewer of these, but what I see now is a complete failure of the Desktop UI and the server. So it all goes down, but at least the Desktop UI doesn't think everything is OK. The only fix I've found with these recent failures is a system reboot.

@rfay
Copy link
Contributor Author

rfay commented May 3, 2023

Sadly, this now has appeared in Docker Desktop for Mac 4.19.0, same behavior, both amd64 and arm64. Desktop shows everything green, daemon has died.

$ docker ps
error during connect: Get "http://%2FUsers%2Ftestbot%2F.docker%2Frun%2Fdocker.sock/v1.24/containers/json": EOF

amd64: A809B877-3A61-4D4B-A198-C55B79DC16BE/20230503122139
arm64: F779B64C-4F74-4FE9-851C-880E07D43B1D/20230503122545

@rfay
Copy link
Contributor Author

rfay commented May 4, 2023

Same is still happening in Docker Desktop for Windows 4.19.0,

714965AA-305C-4811-B395-B1F87D1BCF18/20230504160035

The difference now is having the same behavior on DD for Mac :(

This issue has been open almost a year now, with multiple reports and full diagnostics. I think it should get some kind of priority.

I know that DDEV's test suite must be harder on DD than your own, but wow. The offer still stands to give you access to any of these machines after a failure so you can root around.

@djs55
Copy link

djs55 commented May 5, 2023

Thanks for the diagnostics, I'll have a look.

In the Windows diagnostic it looks like a host component crashed:

[2023-05-04T15:19:19.576671500Z] unexpected fault address 0xa701c0
[2023-05-04T15:19:19.576671500Z] fatal error: fault
[2023-05-04T15:19:19.584710300Z] [signal SIGSEGV: segmentation violation code=0x2 addr=0xa701c0 pc=0xa701c0]
[2023-05-04T15:19:19.584802200Z]
[2023-05-04T15:19:19.584802200Z] goroutine 9 [running]:
[2023-05-04T15:19:19.586400100Z] runtime.throw({0x887ec4?, 0x30?})
[2023-05-04T15:19:19.586400100Z]        runtime/panic.go:1047 +0x5f fp=0xc000056c58 sp=0xc000056c28 pc=0x43623f
[2023-05-04T15:19:19.586400100Z] runtime.sigpanic()
[2023-05-04T15:19:19.586400100Z]        runtime/signal_unix.go:851 +0x28a fp=0xc000056cb8 sp=0xc000056c58 pc=0x44c56a
[2023-05-04T15:19:19.586400100Z] github.com/moby/vpnkit/go/pkg/libproxy.unmarshalFrame({0xa3e720, 0xc000024420})
[2023-05-04T15:19:19.586400100Z]        github.com/moby/[email protected]/go/pkg/libproxy/frame.go:249 +0x6d fp=0xc000056d10 sp=0xc000056cb8 pc=0x75d58d
[2023-05-04T15:19:19.586400100Z] github.com/moby/vpnkit/go/pkg/libproxy.(*multiplexer).run(0xc000124460)
[2023-05-04T15:19:19.586400100Z]        github.com/moby/[email protected]/go/pkg/libproxy/multiplexed.go:601 +0x4b fp=0xc000056eb0 sp=0xc000056d10 pc=0x76366b
[2023-05-04T15:19:19.586400100Z] github.com/moby/vpnkit/go/pkg/libproxy.(*multiplexer).Run.func1()
[2023-05-04T15:19:19.586400100Z]        github.com/moby/[email protected]/go/pkg/libproxy/multiplexed.go:533 +0x37 fp=0xc000056fe0 sp=0xc000056eb0 pc=0x762cd7
[2023-05-04T15:19:19.586400100Z] runtime.goexit()
[2023-05-04T15:19:19.586400100Z]        runtime/asm_amd64.s:1598 +0x1 fp=0xc000056fe8 sp=0xc000056fe0 pc=0x4689e1
[2023-05-04T15:19:19.586400100Z] created by github.com/moby/vpnkit/go/pkg/libproxy.(*multiplexer).Run
[2023-05-04T15:19:19.586400100Z]        github.com/moby/[email protected]/go/pkg/libproxy/multiplexed.go:532 +0xaa

where it's reading a uint16 from a bufio.Reader

@djs55
Copy link

djs55 commented May 5, 2023

In arm64: F779B64C-4F74-4FE9-851C-880E07D43B1D/20230503122545 it seems different, when the diagnostics were taken the docker API is working normally (docker ps returns a list of containers). The only suspicious things I can see are some healthcheck failures in the dockerd logs:

[2023-05-03T04:14:48.146059130Z][dockerd][I] time="2023-05-03T04:14:48.145963880Z" level=warning msg="healthcheck failed" actualDuration="138.459µs" timeout=30s
[2023-05-03T04:14:51.105151006Z][dockerd][I] time="2023-05-03T04:14:51.104752090Z" level=error msg="healthcheck failed fatally"
[2023-05-03T04:14:51.128472881Z][dockerd][I] time="2023-05-03T04:14:51.128185465Z" level=error msg="healthcheck failed fatally"
[2023-05-03T04:14:51.192236840Z][dockerd][I] time="2023-05-03T04:14:51.192016256Z" level=warning msg="healthcheck failed" actualDuration="539.416µs" timeout=30s
[2023-05-03T04:14:51.221673965Z][dockerd][I] time="2023-05-03T04:14:51.221478423Z" level=warning msg="healthcheck failed" actualDuration="327.542µs" timeout=30s
[2023-05-03T04:14:53.147898174Z][dockerd][I] time="2023-05-03T04:14:53.147471674Z" level=error msg="healthcheck failed fatally"
[2023-05-03T04:14:56.200501550Z][dockerd][I] time="2023-05-03T04:14:56.200107717Z" level=error msg="healthcheck failed fatally"
[2023-05-03T04:14:56.222927592Z][dockerd][I] time="2023-05-03T04:14:56.222707051Z" level=error msg="healthcheck failed fatally"

Edit: it looks like there might be a connection leak somewhere:

[2023-05-03T05:16:04.796122000Z][com.docker.driver.amd64-linux][W] dial unix docker.raw.sock: socket: too many open files

Edit: it looks like there are > 7k connections open to the docker API socket:

# cat app/lsof/com.docker.driver.amd64-linux | grep .docker/run/docker.sock |
 wc -l
7665

Edit: As it happens I was planning to work on this component next so I'll do some digging.

Edit again: I suspect the leak is coming from the new com.docker.build process:

# wc -l app/lsof/com.docker.build
7682 app/lsof/com.docker.build

@tonistiigi
Copy link
Member

tonistiigi commented May 5, 2023

@rfay If you can confirm the high lsof for com.docker.build is there anything specific you do that would make this grow faster? If not, then any error cases when you run docker buildx ls ?

@rfay
Copy link
Contributor Author

rfay commented May 5, 2023

Can you give more detail of what you're asking? These test runners run a couple of hours of tests, mostly starting and stopping DDEV and checking behavior. There is a build stage during start. If there's some diagnostic I can add to tests that will help you, I'll be happy to add it.

@tonistiigi
Copy link
Member

In the command above @djs55 showed that com.docker.driver.amd64-linux and com.docker.build had an unusually high number of open connections(logged by running lsof), probably pointing to a leak. I'm trying to understand what are the conditions that would make this leak happen. For example, this could be either sending some specific API requests or getting the daemon into some error state where it would repeatedly try to reconnect. As they are coming from com.docker.build it should be somehow related to builders.

@tonistiigi
Copy link
Member

For clarity, note that this is a new issue. com.docker.build is present only in the very latest version. Any discussion before that is for a different case.

@rfay
Copy link
Contributor Author

rfay commented May 5, 2023

Thanks. If you have something specific I can add to the test runner that will give you more info I'll be happy to do it.

I understand that the macOS issue is new in 4.19.0; the Windows behavior has been happening a really long time. But I also understand that this is a failure of underlying services which the desktop doesn't actually monitor. I'm sure you'll want to improve the Desktop on both platforms so it actually knows when things are working and when they're not.

@rfay
Copy link
Contributor Author

rfay commented May 8, 2023

When I came to fix one of these today (macOS arm64) I saw this modal, surely this doesn't have anything to do with the problem...

Cursor_and_macstadium-m1-1

Diagnostic: 358979A2-0D9F-4631-894B-EA8807C2A1EE/20230508125650

@rfay
Copy link
Contributor Author

rfay commented May 8, 2023

Also @djs55 I saw this in a test failure on macOS arm64, "Error response from daemon: dial unix docker.raw.sock: socket: too many open files'\x1b[0m\n err: exit status 1" - is that's what's happening on the new macOS failures of this class?

@djs55
Copy link

djs55 commented May 9, 2023

Also @djs55 I saw this in a test failure on macOS arm64, "Error response from daemon: dial unix docker.raw.sock: socket: too many open files'\x1b[0m\n err: exit status 1" - is that's what's happening on the new macOS failures of this class?

I think so. The macOS file descriptor limits are low by default (1024 for a process, 10240 for the whole system IIRC?) so I think the file descriptor leak hits the limits sooner than on Windows.

@rfay
Copy link
Contributor Author

rfay commented May 9, 2023

It's interesting that this new leak appeared in 4.19.0; now I'm restarting test runners a few times daily, and now both macOS and Windows. I'm sure the cause is different, happy to open an separate issue in docker/for-mac if it's helpful @djs55 . And of course I hope that the desktop on both can be improved to actually monitor the status of the daemon.

@rfay
Copy link
Contributor Author

rfay commented May 9, 2023

Yeah, latest macOS arm64 failure is again "Error response from daemon: dial unix docker.raw.sock: socket: too many open files"

@rfay
Copy link
Contributor Author

rfay commented May 9, 2023

The macOS default fd limit seems to be 256 for the soft limit, unlimited for hard.

As a workaround for this fd leak I'm changing the limits on the macOS test runners higher, using info from article and corrected file contents

@tonistiigi
Copy link
Member

@rfay We have found the suspected source for the fd leak involving com.docker.build process.

@rfay
Copy link
Contributor Author

rfay commented May 13, 2023

One note about this problem, on both Windows and Mac; I know the cause may be different, but in both cases you usually can't successfully do a "restart" on the desktop, it just hangs forever trying to stop. I always end up rebooting the test runners.

@rfay
Copy link
Contributor Author

rfay commented Jun 5, 2023

Were either the macOS file descriptor leak or the Windows perpetual failure address in Docker Desktop 4.20?

@rfay
Copy link
Contributor Author

rfay commented Jun 22, 2023

I'm pretty sure the behavior of both bugs has changed in 4.20.1, which is good. On Windows we now have a complete crash with panic eventually, on macOS I increased the FD limits but I don't think we're seeing this at the same level.

I'll eventually capture the Windows panic message. As you know, diagnostics aren't very useful in that case because things are dead, but maybe I can get something.

This problem may be

@rfay
Copy link
Contributor Author

rfay commented Jun 29, 2023

I'm going to close this one and will reopen others as needed.

The late-reported macOS failure seems to have been replaced by

The Windows failure of the same type seems to have been replaced by a simple crash of Docker Desktop. I'll try to capture information and create a new issue.

@rfay rfay closed this as completed Jun 29, 2023
@grreyes2
Copy link

grreyes2 commented Oct 9, 2023

@rfay were you able to get this issue resolved on the Windows side?

@rfay
Copy link
Contributor Author

rfay commented Oct 9, 2023

This is still happening regularly on both Windows and macOS. There are various behaviors, including a simple hang on a docker ps with unresponsive docker daemon, etc.

@grreyes2
Copy link

Curious if you have upgraded to the latest version of Docker/Docker Desktop, if not what version are you running if I may ask? @rfay

@rfay
Copy link
Contributor Author

rfay commented Oct 11, 2023

Yes, these DDEV test runners have been following latest Docker Desktop for more than a year now since this was opened.

@blindgeekzone
Copy link

hi. learning databases with linked in learning. using a screen reader, jaws for windows, non visual desktop access and windows narrator. so, installed docker desktop. when i then try to then run the windows server container with jaws 2024 on windows 11 pro. when i then type or copy and paste. jaws, nvda and windows narrator does not give me any verbal feedback. so wondering if this is a docker issue or a screen reader issue. any ideas. frustrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants