-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async/Batching of coroutines #2855
Conversation
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2855 +/- ##
==========================================
- Coverage 45.53% 43.74% -1.80%
==========================================
Files 196 196
Lines 20418 20436 +18
Branches 2647 2649 +2
==========================================
- Hits 9298 8940 -358
- Misses 10658 11271 +613
+ Partials 462 225 -237 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Yee Hing Tong <[email protected]>
@@ -22,6 +22,7 @@ | |||
import msgpack | |||
from dataclasses_json import DataClassJsonMixin, dataclass_json | |||
from flyteidl.core import literals_pb2 | |||
from fsspec.asyn import _run_coros_in_chunks # pylint: disable=W0212 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we bring in this code from fsspec? I rather not depend on private API from another library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsspec is not going anywhere. I don't understand why that particular method is private, but that function existed in its current form since 2021. IMO it's safe to assume that's going to live there for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked them to make it public: fsspec/filesystem_spec#1740. i think we should just use the private function until they make it public. if they refuse, we can always copy later. If we copy, i'll also be copying in the config code cuz I think it is nice to have some user facing controls around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan? is this satisfactory?
Signed-off-by: Yee Hing Tong <[email protected]>
Why are the changes needed?
Because of the unbounded nature of two of the transformers, we noticed higher levels of memory usage when running tests. This runs the coroutines in batches.
What changes were proposed in this pull request?
For the list and dict transformers only, instead of doing
asyncio.gather
on everything at once, run them via fsspec's helper function that batches coroutines. This has the added benefit that it makes the batch size controllable through fsspec's config mechanism.How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link