-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start migrating from dogpile.cache to FSSPEC #431
Conversation
Codecov Report
@@ Coverage Diff @@
## main #431 +/- ##
==========================================
- Coverage 89.77% 89.41% -0.36%
==========================================
Files 85 85
Lines 4744 4792 +48
Branches 439 450 +11
==========================================
+ Hits 4259 4285 +26
- Misses 377 395 +18
- Partials 108 112 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
e70ca89
to
e477c28
Compare
Hi again, in the meanwhile, two more issues have been reported, see #474 and #488, where the situation might be improved through this patch. So, @gutzbenj bumped me to work on integrating it. Thanks again. I will rebase this on top of current main and then, want to humbly ask you for giving it some rounds of testing on your workstations before putting it into the next release. Please note, as outlined at [1], that, after integrating this patch, Wetterdienst still uses a hybrid of dogpile and FSSPEC:
With kind regards, [1] https://gist.github.com/amotl/287f6f0665083a58b5dc60ff823fd1dd#report /cc @HendrikHuel, @AlexDo1 |
ProblemJust for the records: When using SolutionWe fixed it by downgrading to |
Hi again, I would like to add that this patch makes me extraordinarily happy when running the test suite on my workstation.
Before, when running the tests sequentially, the whole process took ~124s (2m4s). Now, it is down to ~27s. @kmuehlbauer: Thanks a stack for suggesting to look at With kind regards, |
Is there a linked fsspec issue? |
if not recursive: | ||
remote_urls = filesystem.find(url) | ||
else: | ||
remote_urls = filesystem.expand_path(url, recursive=recursive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .find()
is totally sufficient and will already scan files in subdirectories. This way recursive
can be omitted totally from this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Maybe that's the reason I used .glob()
before because I intentionally wanted to prevent using recursive operations in all occasions in order to reduce overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, let's keep the implementation like it is now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added 467d0db. Thanks for this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing that you have some additional arguments how to improve this spot, I will still be merging it and look forward to any improvements from your pen through a subsequent patch.
This reverts commit 044a8d2. It is too early. Some components are apparently not thread-safe yet.
This reverts commit 9cb29e7. It is still too early. The error raised is: _gdbm.error: [Errno 11] Resource temporarily unavailable: '/home/runner/.cache/wetterdienst/dogpile/metaindex.dbm' So, some components of `dogpile.cache` with the dbm backend are apparently not thread- or multiprocess-safe yet. This has to be improved.
This reverts commit 5ede1ef.
Use "recursive" directory scanning only for 1-minute resolution on historical data
Hi there,
in order to approach #243 and #253, this patch makes some efforts of replacing dogpile.cache with FSSPEC. It will be based on fsspec/filesystem_spec#560 and fsspec/filesystem_spec#561.
At [1], I am summarizing the current progress and also compiled some insights into the outcome so far.
With kind regards,
Andreas.
[1] https://gist.github.com/amotl/287f6f0665083a58b5dc60ff823fd1dd