Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write test to explain behaviour of .info #185

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

s3bw
Copy link

@s3bw s3bw commented Nov 11, 2019

I am wondering if this behaviour is intended:

The following describe the tests included in this PR. See for contents of filesystem used in tests

  1. Calling: AbstractFileSystem.exists("top_level/second") will result in True even though the directory only exists as second_level.
  2. Calling AbstractFileSystem.exists("top_level/second_level/date/") will also result in True, even though that does not exist as a directory, but "top_level/second_level/date" prefixes other directory names.
  3. Calling AbstractFileSystem.exists("top_level/second_level/date=2019-10-01/a.parq") will result in True, even though it's doesn't exist as a file, only as a prefix again.

And lastly this results in a strange behaviour when doing the following:

    # This does not exist as a directory, and only prefixes the name of a file
    info = test_fs.info("top_level/second_level/date=2019-10-01/a.parq")
    # Yet returns `type` as a directory.
    assert info["type"] == "directory"

My assumption is that this has to do mostly with this line: https://github.com/foxyblue/filesystem_spec/blob/master/fsspec/spec.py#L502 checking for unempty out variable.

I found this bug by using doing:

if not FS.exists("directory"):
    FS.mkdir("directory")
if not FS.exists("dir"):
    FS.mkdir("dir")

Resulting in only "directory" to be created, if creating "dir" was executed first, both folders would be created.

@martindurant
Copy link
Member

You are I think right in your expectations, but this has more to do with the implementation of DummyTestFS rather than the spec:

    def ls(self, path, detail=True, **kwargs):
        files = (file for file in self._fs_contents if path in file["name"])

The test here is a simple in, but should be more specific, that the file-name starts with the given path, split on "/" delimiters.

Many filesystems have their own implementations of path/directory handling, so the exact behaviour may depend on which one you are considering.

@martindurant
Copy link
Member

cc @jonathan-hourany

@s3bw
Copy link
Author

s3bw commented Nov 11, 2019

The one I was using when finding this behaviour was gcsfs, perhaps it's caused by returning /{bucket}/{prefix} as shown in their doc-strings:

https://github.com/dask/gcsfs/blob/master/gcsfs/core.py#L794

@martindurant
Copy link
Member

Specifically in gcsfs, this has been noted before, but I thought was fixed. The meaning of "prefix" is a little ambiguous, but I would understand the same as you, that you need a path separator next in order to match further names.

@TomAugspurger
Copy link
Collaborator

Just to make sure I understand:

  1. The task within fsspec is to update DummyTestFS's implementation
  2. There may be an issue with gcsfs's implementation. Do we have an example failure?

@martindurant
Copy link
Member

Correct on your understanding

@jonathan-hourany
Copy link
Contributor

Hello! So here Martin is almost certainly correct

You are I think right in your expectations, but this has more to do with the implementation of DummyTestFS rather than the spec:

The background story is that since ls in AbstractFileSystem is expected to be overridden there wasn't really a clear template on exactly how it should work so I simply looked at what find was expecting since I was creating the test to test glob. That the ls implementation for DummyTestFS is insufficient for other tests would surprise me a lot less than the thought that exist may be the culprit.

I'd be happy to work on the DummyTestFS object over this weekend if no one else has already started on it

@s3bw
Copy link
Author

s3bw commented Nov 13, 2019

Thanks,

This weekend I can look into the gcsfs's implementation to see if I can pin point the exact problem I was having.

if not FS.exists("directory"):
    FS.mkdir("directory")

if not FS.exists("dir"):
    FS.mkdir("dir")

Creating only directory

if not FS.exists("dir"):
    FS.mkdir("dir")

if not FS.exists("directory"):
    FS.mkdir("directory")

Creating both directory and dir.

@martindurant
Copy link
Member

Any progress here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants