Skip to content

Commit

Permalink
[feature] Add tests and documentation for IPFS, SMB, DropBox, WebDAV
Browse files Browse the repository at this point in the history
  • Loading branch information
mxmlnkn committed Oct 13, 2024
1 parent c96e2aa commit 3875bea
Show file tree
Hide file tree
Showing 10 changed files with 378 additions and 41 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -285,13 +285,17 @@ jobs:
- name: Regression Tests (FUSE 3)
if: ${{ !startsWith( matrix.os, 'macos' ) }}
env:
DROPBOX_TOKEN: ${{ secrets.DROPBOX_TOKEN }}
run: |
export FUSE_LIBRARY_PATH=$( dpkg -L libfuse3-3 | 'grep' -F .so | head -1 )
ratarmount --version
bash tests/runtests.sh
- name: Regression Tests (FUSE 2)
if: ${{ !startsWith( matrix.os, 'macos' ) }}
env:
DROPBOX_TOKEN: ${{ secrets.DROPBOX_TOKEN }}
run: |
bash tests/runtests.sh
python3 -c 'import pygit2'
Expand Down
60 changes: 44 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -524,27 +524,55 @@ lbzip2 -cd well-compressed-file.bz2 | createMultiFrameZstd $(( 4*1024*1024 )) >

# Remote Files

The [fsspec](https://github.com/fsspec/filesystem_spec) API backend adds support for mounting many remote archive or folders:
The [fsspec](https://github.com/fsspec/filesystem_spec) API backend adds support for mounting many remote archive or folders.
Please refer to the linked respective backend documentation to see the full configuration options, especially for specifying credentials.
Some often-used configuration environment variables are copied here for easier viewing.

- `git://[path-to-repo:][ref@]path/to/file`
| Symbol | Description |
| ------------- | ------------------------- |
| `[something]` | Optional "something" |
| `(one\|two)` | Either "one" or "two" |

- `git://[path-to-repo:][ref@]path/to/file`</br>
Uses the current path if no repository path is specified.
- `github://org:repo@[sha]/path-to/file-or-folder`
E.g. github://mxmlnkn:ratarmount@v0.15.2/tests/single-file.tar
- `http[s]://hostname[:port]/path-to/archive.rar`
- `s3://[endpoint-hostname[:port]]/bucket[/single-file.tar[?versionId=some_version_id]]`
Will default to AWS according to the Boto3 library defaults when no endpoint is specified.
Backend: [ratarmountcore](https://github.com/mxmlnkn/ratarmount/blob/master/core/ratarmountcore/GitMountSource.py)
via [pygit2](https://github.com/libgit2/pygit2)
- `github://org:repo@[sha]/path-to/file-or-folder`</br>
Example: `github://mxmlnkn:[email protected]/tests/single-file.tar`</br>
Backend: [fsspec](https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/github.py)
- `http[s]://hostname[:port]/path-to/archive.rar`</br>
Backend: [fsspec](https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/http.py)
via [aiohttp](https://github.com/aio-libs/aiohttp)
- `(ipfs|ipns)://content-identifier`</br>
Example: `ipfs daemon & sleep 2 && ratarmount -f ipfs://QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG mounted`</br>
Backend: [fsspec/ipfsspec](https://github.com/fsspec/ipfsspec)</br>
Tries to connect to running local [`ipfs daemon`](https://github.com/ipfs/kubo) instance by default, which needs to be started beforehand.
~~Alternatively, a (public) gateway can be specified with the environment variable `IPFS_GATEWAY`, e.g., `https://127.0.0.1:8080`.~~
Specifying a public gateway does not (yet) work because of [this](https://github.com/fsspec/ipfsspec/issues/39) issue.
- `s3://[endpoint-hostname[:port]]/bucket[/single-file.tar[?versionId=some_version_id]]`</br>
Backend: [fsspec/s3fs](https://github.com/fsspec/s3fs) via [boto3](https://github.com/boto/boto3)</br>
The URL will default to AWS according to the Boto3 library defaults when no endpoint is specified.
Boto3 will check, among others, [these environment variables](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html), for credentials:
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `AWS_SESSION_TOKEN`
- `AWS_DEFAULT_REGION`, e.g., `us-west-1`
fsspec/s3fs furthermore supports these environment variables:
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_DEFAULT_REGION`

[fsspec/s3fs](https://github.com/fsspec/s3fs) furthermore supports this environment variable:
- [`FSSPEC_S3_ENDPOINT_URL`](https://github.com/fsspec/s3fs/pull/704), e.g., `http://127.0.0.1:8053`
- `[s]ftp://[user[:password]@]hostname[:port]/path-to/archive.rar`
- `ssh://[user[:password]@]hostname[:port]/path-to/archive.rar`
- `ftp://[user[:password]@]hostname[:port]/path-to/archive.rar`</br>
Backend: [fsspec](https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/ftp.py)
via [ftplib](https://docs.python.org/3/library/ftplib.html)
- `(ssh|sftp)://[user[:password]@]hostname[:port]/path-to/archive.rar`</br>
Backend: [fsspec/sshfs](https://github.com/fsspec/sshfs)
via [asyncssh](https://github.com/ronf/asyncssh)</br>
The usual configuration via [`~/.ssh/config`](https://linux.die.net/man/5/ssh_config) is supported.
- `smb://[workgroup;][user:password@]server[:port]/share/folder/file.tar`

Many others fsspec-based projects may also work when installed.
- `webdav://[user:password@]host[:port][/path]`</br>
Backend: [webdav4](https://github.com/skshetry/webdav4) via [httpx](https://github.com/encode/httpx)</br>
Environment variables: `WEBDAV_USER`, `WEBDAV_PASSWORD`
- `dropbox://path`</br>
Backend: [fsspec/dropboxdrivefs](https://github.com/fsspec/dropboxdrivefs) via [dropbox-sdk-python](https://github.com/dropbox/dropbox-sdk-python)</br>
Follow [these instructions](https://dropbox.tech/developers/generate-an-access-token-for-your-own-account) to create an [app](https://www.dropbox.com/developers/apps). Check the `files.metadata.read` and `files.content.read` permissions and press "submit" and **after** that create the (long) OAuth 2 token and store it in the environment variable `DROPBOX_TOKEN`. Ignore the (short) app key and secret. This creates a corresponding app folder that can be filled with data.

[Many other](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations) fsspec-based projects may also work when installed.


# Writable Mounting
Expand Down
26 changes: 21 additions & 5 deletions core/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ full = [
# Pin to < 3.12 because of https://github.com/nathanhi/pyfatfs/issues/41
'pyfatfs ~= 1.0; python_version < "3.12.0"',
"fast_zip_decryption",
"pygit2",
# fsspec:
"requests",
"aiohttp",
Expand All @@ -84,30 +85,34 @@ full = [
# https://github.com/ronf/asyncssh/issues/690
"pyopenssl>=23",
"smbprotocol",
"pygit2",
"dropboxdrivefs",
"fsspec",
"ipfsspec",
"s3fs",
"webdav4",
#"gcsfs", # untested
#"adlfs", # untested. build error in Python 3.13
#"dropboxdrivefs", # untested
]
bzip2 = ["rapidgzip >= 0.13.1"]
git = ["pygit2"]
gzip = ["indexed_gzip >= 1.6.3, < 2.0"]
fsspec = [
fsspec = ["fsspec"]
fsspec-backends = [
# Copy-pasted from fsspec[full] list. Some were excluded because they are too unproportionally large.
"requests",
"aiohttp",
"sshfs", # For performance, asyncssh > 2.17 would be recommended: https://github.com/ronf/asyncssh/issues/691
# Need newer pyopenssl than comes with Ubuntu 22.04.
# https://github.com/ronf/asyncssh/issues/690
"pyopenssl>=23",
"smbprotocol", # build error in Python 3.13
"smbprotocol",
"dropboxdrivefs",
"fsspec",
"ipfsspec",
"s3fs",
"webdav4",
#"gcsfs", # untested
#"adlfs", # untested. build error in Python 3.13
#"dropboxdrivefs", # untested
# "dask", "distributed" : ~34 MB, ~10 MB gzip-compressed
# "pyarrow >= 1" : ~196 MB, ~60 MB gzip-compressed, build error in Python 3.13
# "ocifs" : ~350 MB
Expand Down Expand Up @@ -147,6 +152,17 @@ fat = [
'pyfatfs ~= 1.0; python_version < "3.12.0"',
'pyfatfs@git+https://github.com/mxmlnkn/[email protected] ; python_version >= "3.12.0"',
]
# All optional dependencies of asyncssh via sshfs. I have not yet needed any of these.
# Half of these are installed anyway via other dependencies.
full-ssh = [
"sshfs[bcrypt]",
"sshfs[fido2]",
"sshfs[gssapi]",
"sshfs[libnacl]",
"sshfs[python-pkcs11]",
"sshfs[pyOpenSSL]",
#"sshfs[pywin32]", # Only Windows? asyncssh has no platform specifier though...
]

[tool.setuptools]
license-files = [
Expand Down
45 changes: 39 additions & 6 deletions core/ratarmountcore/FSSpecMountSource.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@
except ImportError:
fsspec = None # type: ignore

try:
from webdav4.fsspec import WebdavFileSystem
except ImportError:
WebdavFileSystem = None # type: ignore

try:
from dropboxdrivefs import DropboxDriveFileSystem
except ImportError:
DropboxDriveFileSystem = None # type: ignore


class FSSpecMountSource(MountSource):
"""
Expand Down Expand Up @@ -71,12 +81,16 @@ def __init__(self, urlOrFS, prefix: Optional[str] = None, **options) -> None:

# The fsspec filesystems are not uniform! http:// expects the arguments to isdir with prefixed
# protocol while other filesystem implementations are fine with only the path.
# https://github.com/ray-project/ray/issues/26423#issuecomment-1179561181
self._isHTTP = isinstance(self.fileSystem, fsspec.implementations.http.HTTPFileSystem)
# - https://github.com/ray-project/ray/issues/26423#issuecomment-1179561181
# - https://github.com/fsspec/filesystem_spec/issues/1713
# - https://github.com/skshetry/webdav4/issues/198
self._pathsRequireQuoting = isinstance(self.fileSystem, fsspec.implementations.http.HTTPFileSystem)
if WebdavFileSystem:
self._pathsRequireQuoting = self._pathsRequireQuoting or isinstance(self.fileSystem, WebdavFileSystem)
self.prefix = prefix.rstrip("/") if prefix and prefix.strip("/") and self.fileSystem.isdir(prefix) else ""

def _getPath(self, path: str) -> str:
if self._isHTTP:
if self._pathsRequireQuoting:
path = urllib.parse.quote(path)
if self.prefix:
if not path or path == "/":
Expand Down Expand Up @@ -110,9 +124,10 @@ def _convertToFileInfo(entry, path) -> FileInfo:
# They kinda work only like hardlinks.
# https://github.com/fsspec/filesystem_spec/issues/1679
# https://github.com/fsspec/filesystem_spec/issues/1680
size = entry.get('size', 0)
return FileInfo(
# fmt: off
size = entry.get('size', 0),
size = size if size else 0,
mtime = FSSpecMountSource._getModificationTime(entry),
mode = FSSpecMountSource._getMode(entry),
linkname = "",
Expand All @@ -133,6 +148,14 @@ def exists(self, path: str) -> bool:
def _listDir(self, path: str, onlyMode: bool) -> Optional[Union[Iterable[str], Dict[str, FileInfo]]]:
path = self._getPath(path)

if path == '/' and DropboxDriveFileSystem and isinstance(self.fileSystem, DropboxDriveFileSystem):
# We need to work around this obnoxious error:
# dropbox.exceptions.BadInputError: BadInputError(
# '12345', 'Error in call to API function "files/list_folder":
# request body: path: Specify the root folder as an empty string rather than as "/".')
# On the other hand, all paths must start with / or else they will not be found...
path = ""

result = self.fileSystem.listdir(path, detail=True)
if not result:
return []
Expand Down Expand Up @@ -181,10 +204,14 @@ def _listDir(self, path: str, onlyMode: bool) -> Optional[Union[Iterable[str], D
)
for entry in result
}
if self._isHTTP:

# For HTTPFileSystem, we need to filter out the entries for sorting.
# For WebDAV we do not even need to unquote! We get unquoted file names with ls!
if isinstance(self.fileSystem, fsspec.implementations.http.HTTPFileSystem):
return {
urllib.parse.unquote(name): info for name, info in result.items() if not name.startswith(('?', '#'))
}

return result

@overrides(MountSource)
Expand Down Expand Up @@ -214,7 +241,7 @@ def _getFileInfoHTTP(self, path: str) -> Optional[FileInfo]:

@overrides(MountSource)
def getFileInfo(self, path: str, fileVersion: int = 0) -> Optional[FileInfo]:
if self._isHTTP:
if isinstance(self.fileSystem, fsspec.implementations.http.HTTPFileSystem):
return self._getFileInfoHTTP(path)

path = self._getPath(path)
Expand All @@ -236,6 +263,12 @@ def getFileInfo(self, path: str, fileVersion: int = 0) -> Optional[FileInfo]:
# asyncssh/sftp.py", line 2484, in _process_status
# raise exc
# asyncssh.sftp.SFTPNoSuchFile: No such file
#
# Dropbox also does not like this:
#
# dropbox.exceptions.BadInputError: BadInputError('12345',
# 'Error in call to API function "files/get_metadata":
# request body: path: The root folder is unsupported.')
return self.rootFileInfo.clone()

if not self.fileSystem.lexists(path):
Expand Down
75 changes: 75 additions & 0 deletions core/ratarmountcore/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
# pylint: disable=no-member,abstract-method
# Disable pylint errors. See https://github.com/fsspec/filesystem_spec/issues/1678

import http
import os
import re
import stat
import sys
import traceback
Expand Down Expand Up @@ -60,6 +62,23 @@ def open(self, *args, **kwargs):
except ImportError:
FixedSSHFileSystem = None # type: ignore

try:
from webdav4.fsspec import WebdavFileSystem
except ImportError:
WebdavFileSystem = None # type: ignore

try:
from dropboxdrivefs import DropboxDriveFileSystem

class FixedDropboxDriveFileSystem(DropboxDriveFileSystem):
def info(self, url, **kwargs):
if url == '/' or url == '':
return {'size': 0, 'name': '/', 'type': 'directory'}
return super().info(url, **kwargs)

except ImportError:
FixedDropboxDriveFileSystem = None # type: ignore


def _openRarMountSource(fileOrPath: Union[str, IO[bytes]], **options) -> Optional[MountSource]:
try:
Expand Down Expand Up @@ -255,6 +274,62 @@ def tryOpenURL(url, printDebug: int) -> Union[MountSource, IO[bytes], str]:
with warnings.catch_warnings():
warnings.simplefilter("ignore")
fileSystem, path = fsspec.url_to_fs(url)
elif protocol == 'webdav':
# WebDAV needs special handling because we need to decide between HTTP and HTTPS and because of:
# https://github.com/skshetry/webdav4/issues/197
if not WebdavFileSystem:
raise RatarmountError(f"Install the webdav4 Python package to mount {protocol}://.")

matchedURI = re.match("(?:([^:/]*):([^@/]*)@)?([^/]*)(.*)", splitURI[1])
if not matchedURI:
raise RatarmountError(
"Failed to match WebDAV URI of the format webdav://[user:password@]host[:port][/path]\n"
"If your user name or password contains special characters such as ':/@', then use the environment "
"variables WEBDAV_USER and WEBDAV_PASSWORD to specify them."
)
username, password, baseURL, path = matchedURI.groups()
if path is None:
path = ""
if username is None and 'WEBDAV_USER' in os.environ:
username = os.environ.get('WEBDAV_USER')
if password is None and 'WEBDAV_PASSWORD' in os.environ:
password = os.environ.get('WEBDAV_PASSWORD')
auth = None if username is None or password is None else (username, password)

def checkForHTTPS(url):
try:
connection = http.client.HTTPSConnection(url, timeout=2)
connection.request("HEAD", "/")
return bool(connection.getresponse())
except Exception as exception:
if printDebug >= 3:
print("[Info] Determined WebDAV URL to not use HTTP instead HTTPS because of:", exception)
return False

transportProtocol = "https" if checkForHTTPS(baseURL) else "http"
fileSystem = WebdavFileSystem(f"{transportProtocol}://{baseURL}", auth=auth)
elif protocol == 'dropbox':
# Dropbox needs special handling because there is no way to specify the token and because
# there are some obnoxius intricacies regarding ls and stat of the root folder.
if FixedDropboxDriveFileSystem is None:
raise RatarmountError(f"Install the dropboxdrivefs Python package to mount {protocol}://.")

dropboxToken = os.environ.get('DROPBOX_TOKEN', None)
if not dropboxToken:
raise RatarmountError(
"Please set the DROPBOX_TOKEN environment variable to mount dropbox:// URLs. "
"Please refer to the ratarmount online ReadMe or to the DropBox documentation for creating a token."
)

fileSystem = FixedDropboxDriveFileSystem(token=dropboxToken)
path = splitURI[1]
# Dropbox requires all paths to start with /, so simply add it
# instead of making each user run into this problem.
if path and not path.startswith('/'):
path = '/' + path
# Dropbox also does not like trailing / -.-. God is it super finicky.
# dropbox.exceptions.ApiError: ApiError('12345', GetMetadataError('path', LookupError('malformed_path', None)))
path = path.rstrip('/')
else:
fileSystem, path = fsspec.url_to_fs(url)

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ xz = ["ratarmountcore[xz]"]
zip = ["ratarmountcore[zip]"]
zstd = ["ratarmountcore[zstd]"]
squashfs = ["ratarmountcore[squashfs]"]
fsspec = ["ratarmountcore[fsspec]"]
fsspec = ["ratarmountcore[fsspec-backends]"]

[project.scripts]
ratarmount = "ratarmount:cli"
Expand Down
Loading

0 comments on commit 3875bea

Please sign in to comment.