Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start migrating from dogpile.cache to FSSPEC #431

Merged
merged 35 commits into from
Aug 30, 2021
Merged

Start migrating from dogpile.cache to FSSPEC #431

merged 35 commits into from
Aug 30, 2021

Conversation

amotl
Copy link
Member

@amotl amotl commented Apr 25, 2021

Hi there,

in order to approach #243 and #253, this patch makes some efforts of replacing dogpile.cache with FSSPEC. It will be based on fsspec/filesystem_spec#560 and fsspec/filesystem_spec#561.

At [1], I am summarizing the current progress and also compiled some insights into the outcome so far.

With kind regards,
Andreas.

[1] https://gist.github.com/amotl/287f6f0665083a58b5dc60ff823fd1dd

@codecov
Copy link

codecov bot commented Apr 25, 2021

Codecov Report

Merging #431 (24ed27f) into main (39781ae) will decrease coverage by 0.35%.
The diff coverage is 93.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   89.77%   89.41%   -0.36%     
==========================================
  Files          85       85              
  Lines        4744     4792      +48     
  Branches      439      450      +11     
==========================================
+ Hits         4259     4285      +26     
- Misses        377      395      +18     
- Partials      108      112       +4     
Flag Coverage Δ
unittests 89.41% <93.91%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wetterdienst/util/cache.py 85.00% <80.76%> (-3.00%) ⬇️
wetterdienst/util/network.py 74.28% <96.00%> (-25.72%) ⬇️
wetterdienst/provider/dwd/forecast/access.py 95.89% <100.00%> (+0.05%) ⬆️
wetterdienst/provider/dwd/forecast/api.py 77.14% <100.00%> (ø)
wetterdienst/provider/dwd/index.py 96.00% <100.00%> (+4.69%) ⬆️
wetterdienst/provider/dwd/observation/download.py 75.75% <100.00%> (-0.72%) ⬇️
wetterdienst/provider/dwd/observation/metaindex.py 92.85% <100.00%> (-0.08%) ⬇️
wetterdienst/provider/dwd/radar/access.py 85.00% <100.00%> (-1.78%) ⬇️
wetterdienst/provider/dwd/radar/index.py 100.00% <100.00%> (ø)
wetterdienst/util/io.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39781ae...24ed27f. Read the comment docs.

@amotl amotl force-pushed the collab/fsspec branch 2 times, most recently from e70ca89 to e477c28 Compare April 25, 2021 22:44
@amotl amotl changed the title Start migrating from dogpile.cache to FSSPEC Migrate from dogpile.cache to FSSPEC Apr 26, 2021
@amotl
Copy link
Member Author

amotl commented Aug 8, 2021

Hi again,

in the meanwhile, two more issues have been reported, see #474 and #488, where the situation might be improved through this patch. So, @gutzbenj bumped me to work on integrating it. Thanks again.

I will rebase this on top of current main and then, want to humbly ask you for giving it some rounds of testing on your workstations before putting it into the next release.

Please note, as outlined at [1], that, after integrating this patch, Wetterdienst still uses a hybrid of dogpile and FSSPEC:

The metaindex and fileindex caches are still supported by dogpile.cache. The reason is that FSSPEC does not have a persistent directory cache yet.

With kind regards,
Andreas.

[1] https://gist.github.com/amotl/287f6f0665083a58b5dc60ff823fd1dd#report

/cc @HendrikHuel, @AlexDo1

@amotl amotl changed the title Migrate from dogpile.cache to FSSPEC Start migrating from dogpile.cache to FSSPEC Aug 8, 2021
@amotl
Copy link
Member Author

amotl commented Aug 8, 2021

Problem

Just for the records: When using fsspec>=2021.6.0, Wetterdienst apparently observes a regression. More details can be found at #243 (comment).

Solution

We fixed it by downgrading to fsspec==2021.5.0, see 4f6ea1c.

@amotl
Copy link
Member Author

amotl commented Aug 10, 2021

Hi again,

I would like to add that this patch makes me extraordinarily happy when running the test suite on my workstation.

$ time poetry run poe test-parallel

Poe => pytest -vvv --numprocesses=4 -m not (explorer or cflake) tests
======================== test session starts ========================
platform darwin -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /Users/amo/Library/Caches/pypoetry/virtualenvs/wetterdienst-EkOFQaO8-py3.9/bin/python
cachedir: .pytest_cache
rootdir: /Users/amo/dev/earthobservations/wetterdienst, configfile: pyproject.toml
plugins: cov-2.12.1, dictsdiff-0.5.8, xdist-2.3.0, dash-1.19.0, notebook-0.6.1, anyio-3.2.1, forked-1.3.0
[gw0] darwin Python 3.9.6 cwd: /Users/amo/dev/earthobservations/wetterdienst
[gw1] darwin Python 3.9.6 cwd: /Users/amo/dev/earthobservations/wetterdienst
[gw2] darwin Python 3.9.6 cwd: /Users/amo/dev/earthobservations/wetterdienst
[gw3] darwin Python 3.9.6 cwd: /Users/amo/dev/earthobservations/wetterdienst
[gw0] Python 3.9.6 (default, Jun 29 2021, 06:20:32)  -- [Clang 12.0.0 (clang-1200.0.32.29)]
[gw1] Python 3.9.6 (default, Jun 29 2021, 06:20:32)  -- [Clang 12.0.0 (clang-1200.0.32.29)]
[gw2] Python 3.9.6 (default, Jun 29 2021, 06:20:32)  -- [Clang 12.0.0 (clang-1200.0.32.29)]
[gw3] Python 3.9.6 (default, Jun 29 2021, 06:20:32)  -- [Clang 12.0.0 (clang-1200.0.32.29)]
gw0 [232] / gw1 [232] / gw2 [232] / gw3 [232]

[...]

======================== 219 passed, 6 skipped, 7 xfailed, 13125 warnings in 26.99s ========================

Before, when running the tests sequentially, the whole process took ~124s (2m4s). Now, it is down to ~27s.

@kmuehlbauer: Thanks a stack for suggesting to look at pytest-xdist the other day.
@martindurant: Thank you so much for your excellent work on FSSPEC.

With kind regards,
Andreas.

@martindurant
Copy link

Is there a linked fsspec issue?

@amotl amotl marked this pull request as draft August 14, 2021 11:23
@amotl amotl marked this pull request as ready for review August 15, 2021 19:44
@amotl amotl requested a review from gutzbenj August 15, 2021 19:56
Comment on lines +134 to +137
if not recursive:
remote_urls = filesystem.find(url)
else:
remote_urls = filesystem.expand_path(url, recursive=recursive)
Copy link
Member

@gutzbenj gutzbenj Aug 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .find() is totally sufficient and will already scan files in subdirectories. This way recursive can be omitted totally from this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Maybe that's the reason I used .glob() before because I intentionally wanted to prevent using recursive operations in all occasions in order to reduce overhead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's keep the implementation like it is now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added 467d0db. Thanks for this suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that you have some additional arguments how to improve this spot, I will still be merging it and look forward to any improvements from your pen through a subsequent patch.

amotl added 22 commits August 30, 2021 20:44
This reverts commit 044a8d2.

It is too early. Some components are apparently not thread-safe yet.
This reverts commit 9cb29e7.

It is still too early. The error raised is:

_gdbm.error: [Errno 11] Resource temporarily unavailable: '/home/runner/.cache/wetterdienst/dogpile/metaindex.dbm'

So, some components of `dogpile.cache` with the dbm backend are
apparently not thread- or multiprocess-safe yet. This has to be
improved.
This reverts commit 5ede1ef.
@amotl amotl requested a review from gutzbenj August 30, 2021 18:47
@amotl amotl merged commit 36704ff into main Aug 30, 2021
@amotl amotl deleted the collab/fsspec branch August 30, 2021 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants