Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macos/9pfs: funky recursive readdir() results on mounted volumes #21097

Closed
ghost opened this issue Dec 28, 2023 · 6 comments
Closed

macos/9pfs: funky recursive readdir() results on mounted volumes #21097

ghost opened this issue Dec 28, 2023 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine macos MacOS (OSX) related remote Problem is in podman-remote stale-issue

Comments

@ghost
Copy link

ghost commented Dec 28, 2023

Issue Description

Several commands hang when recursing into directory structures mounted from macOS into podman containers. The problem does not occur in the container filesystem; only within mounted volumes.

It appears to be related to opendir()/readdir() returning strange results; it's triggerable using busybox find(1) on alpine (musl), for example: see below.

I'm not at all sure if the problem is in podman, the CoreOS kernel, musl in alpine, or what, but thus far I've only been able to repro it with podman on macos, so I'm reporting it here.

Steps to reproduce the issue

Steps to reproduce the issue

  1. podman machine start && cd $HOME && mkdir foo foo/bar foo/bar/baz
  2. podman run --rm -it -v $HOME/foo:/tmp alpine:latest find /tmp

The problem does not occur if foo/bar/baz does not exist, or is not a directory; it appears two levels of directories are required to trigger it.

Describe the results you received

/tmp
/tmp/bar
/tmp/bar/baz
/tmp/bar
/tmp/bar/baz
/tmp/bar
/tmp/bar/baz
/tmp/bar
/tmp/bar/baz
/tmp/bar
[... continues until killed]

Describe the results you expected

/tmp
/tmp/bar
/tmp/bar/baz

podman info output

host:
  arch: arm64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.fc39.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 93.69
    systemPercent: 2.48
    userPercent: 3.83
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "39"
  eventLogger: journald
  freeLocks: 2048
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.6.3-200.fc39.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 541609984
  memTotal: 1979428864
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc39.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc39.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.12-1.fc39.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.12
      commit: ce429cb2e277d001c2179df1ac66a470f00802ae
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231119.g4f1709d-1.fc39.aarch64
    version: |
      pasta 0^20231119.g4f1709d-1.fc39.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/501/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 4m 38.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 2947833856
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.2
  Built: 1698762633
  BuiltTime: Tue Oct 31 23:30:33 2023
  GitCommit: ""
  GoVersion: go1.21.1
  Os: linux
  OsArch: linux/arm64
  Version: 4.7.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

I'm not sure how to upgrade podman from 4.7.2 in the coreos image; 4.8.2 does not appear to be available.

Additional information

The following program demonstrates the strange results from readdir(), modelled after the busybox find(1) implementation:

#include <sys/stat.h>
#include <err.h>
#include <dirent.h>
#include <stdio.h>
#include <string.h>

char *root;

void
recurse(char *fn)
{
	struct stat sb;
	if (lstat(fn, &sb) == -1)
		err(1, "lstat %s", fn);
	if (!S_ISDIR(sb.st_mode)) {
		printf("%s\n", fn);
		return;
	}
	printf("%s/\n", fn);
	DIR *d = opendir(fn);
	if (!d)
		err(1, "opendir %s", fn);
	struct dirent *next;
	while ((next = readdir(d)) != NULL) {
		char *nextfn;
		if (strcmp(fn, root) == 0) {
			printf("readdir(%s) next: %s\n", root, next->d_name);
		}
		if (strcmp(next->d_name, ".") == 0)
			continue;
		if (strcmp(next->d_name, "..") == 0)
			continue;
		if (asprintf(&nextfn, "%s/%s", fn, next->d_name) == -1)
			err(1, "asprintf %s/%s", fn, next->d_name);
		recurse(nextfn);
	}
	closedir(d);
}

int
main(int argc, char **argv)
{
	if (argc != 2)
		errx(1, "usage");
	root = argv[1];
	recurse(root);
}

when run in the alpine:latest container with the mounts from the repro steps, outputs:

/ # ./a.out /tmp
/tmp/
readdir(/tmp) next: .
readdir(/tmp) next: ..
readdir(/tmp) next: bar
/tmp/bar/
/tmp/bar/baz/
readdir(/tmp) next: .
readdir(/tmp) next: ..
readdir(/tmp) next: bar
/tmp/bar/
/tmp/bar/baz/
readdir(/tmp) next: .
readdir(/tmp) next: ..
readdir(/tmp) next: bar
[... ad infinitum]

i.e., it appears that after recursing into /tmp/bar and returning from that recurse, readdir() returns the results from the start all over again even though opendir() on it was only called once, at the beginning.

@ghost ghost added the kind/bug Categorizes issue or PR as related to a bug. label Dec 28, 2023
@github-actions github-actions bot added the remote Problem is in podman-remote label Dec 28, 2023
@rhatdan
Copy link
Member

rhatdan commented Dec 28, 2023

The default file system for QEMU version of Podman machine is plan9. When we switch to using applehv, it will be virtiofsd. It would be interesting to see if the problem disappears with the switch.

@afbjorklund
Copy link
Contributor

afbjorklund commented Dec 28, 2023

I'm not at all sure if the problem is in podman, the CoreOS kernel, musl in alpine, or what,

Seems to be specific to virtfs-on-darwin, does not reproduce on Linux.

https://github.com/containers/podman/releases/download/v4.7.2/podman-remote-static-linux_amd64.tar.gz

$ podman-remote-static --connection podman-machine-default run --rm -it -v $HOME/foo:/tmp alpine:latest find /tmp
/tmp
/tmp/bar
/tmp/bar/baz

So probably: none of the above, but most likely in the 9p server (or corner case in 9p client)

@Luap99 Luap99 added macos MacOS (OSX) related machine labels Jan 3, 2024
@ghost
Copy link
Author

ghost commented Jan 5, 2024

the problem doesn't occur when running my reproducer directly on CoreOS via podman machine ssh though, so maybe it's virtfs-on-darwin but also musl specific, or maybe the container mounts change something.

Copy link

github-actions bot commented Feb 5, 2024

A friendly reminder that this issue had no activity for 30 days.

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2024

podman 5.0 uses virtiofs with apple hypervisor so I suggest you retry with that, either way this is doe snot seem to be bug in podman

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
@ghost
Copy link
Author

ghost commented Apr 5, 2024

podman 5.0 uses virtiofs with apple hypervisor so I suggest you retry with that, either way this is doe snot seem to be bug in podman

thanks. I can't reproduce the issue with 5.0, so whichever component it was in, this issue can be closed.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 5, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine macos MacOS (OSX) related remote Problem is in podman-remote stale-issue
Projects
None yet
Development

No branches or pull requests

3 participants