Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat call for dir1/dir2/dir3/file triggers many List and Head requests #770

Closed
mhnap opened this issue Feb 21, 2024 · 1 comment
Closed
Labels
question Further information is requested

Comments

@mhnap
Copy link

mhnap commented Feb 21, 2024

Mountpoint for Amazon S3 version

mount-s3 1.4.0

AWS Region

us-east-2

Describe the running environment

Running in local Ubuntu 22.04.

Mountpoint options

mount-s3 mhnap-bucket /media/mhnap/mnt --read-only --log-directory=/tmp/mount-s3/logs --debug --debug-crt --log-metrics

What happened?

I have created dir1/dir2/dir3/ directory hierarchy with one file inside in the local system and uploaded dir1 to my newly created mhnap-bucket.

mhnap@hp:~/projects/playground$ aws s3 ls --recursive s3://mhnap-bucket
2024-02-21 22:59:13         10 dir1/dir2/dir3/file

After, I mounted this bucket using the mount-s3 command and called stat (using the same code from stat doc) for /media/mhnap/mnt/dir1/dir2/dir3/file path 10 times.

My concern is that I see also List and Head requests for dir1/, dir1/dir2/, dir1/dir2/dir3/, and not only one Head request for dir1/dir2/dir3/file as I would expect. In CloudWatch I see a total of 72 List and 45 Head requests for the whole test duration.

I may be missing something and it indeed can be correct behavior. In such a case, I would be grateful to find an explanation for such behavior.

Cannot paste logs, so uploaded them here: mountpoint-s3-2024-02-21T21-07-50Z.log

Relevant log output

No response

@mhnap mhnap added the bug Something isn't working label Feb 21, 2024
@jamesbornholt
Copy link
Member

Yeah, this is unfortunate but it's the expected behavior. It's common to all Linux file systems, which never get to see the entire path in one shot. Instead, they're always accessed one directory at a time: to access dir1/dir2/dir3/file, Linux first has to check that dir1 exists and is a directory, then dir2 exists and is a directory inside dir1, etc.

You can use metadata caching to work around the cost of these repeated lookups, although with the caveats you've mentioned in #768 and #759.

If you know you'll only be accessing a subdirectory of your bucket rather than the whole thing, you can also use the --prefix argument to mount just that subdirectory, which will remove the need to do some of these recursive lookups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants