-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC into RFC #5653
RFC into RFC #5653
Conversation
* Remove shard key in admin-launcher Signed-off-by: Thomas Newton <[email protected]> * Don't mutate existing state Signed-off-by: Thomas Newton <[email protected]> * Don't mutate state Signed-off-by: Thomas Newton <[email protected]> * Add a test Signed-off-by: Thomas Newton <[email protected]> --------- Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: YiShengOng <[email protected]>
…4986) * Add tracking for active node and task execution counts in propeller Signed-off-by: Shardool <[email protected]> * Update unit tests for task and node execution counts Signed-off-by: Shardool <[email protected]> * Fix linter errors Signed-off-by: Shardool <[email protected]> * fix linter errors Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Shardool <[email protected]> Signed-off-by: Paul Dittamo <[email protected]> Co-authored-by: Paul Dittamo <[email protected]>
…ors (#5161) * include container statuses for all container exit errors Signed-off-by: Paul Dittamo <[email protected]> * add unit test Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Paul Dittamo <[email protected]>
* docs(flyte-core): add missing key `adminServer` in authentication guide Signed-off-by: Julian Einhaus <[email protected]> * docs(flyte-core): fix indentation for flyte-core with AzureAD authorization guide Signed-off-by: Julian Einhaus <[email protected]> --------- Signed-off-by: Julian Einhaus <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
* update arraynode proto parallelism field to varint compatible int64 Signed-off-by: Paul Dittamo <[email protected]> * have array nodes utilize workflow parallelism Signed-off-by: Paul Dittamo <[email protected]> * return if available parallelism is 0 Signed-off-by: Paul Dittamo <[email protected]> * unit test Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Paul Dittamo <[email protected]>
* Don't use `defer` for streak length reporting Signed-off-by: Thomas Newton <[email protected]> * Make it work with defer Signed-off-by: Thomas Newton <[email protected]> * Fix lint Signed-off-by: Thomas Newton <[email protected]> --------- Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
* Testing agents in the development environment Signed-off-by: Future-Outlier <[email protected]> * nit Signed-off-by: Future-Outlier <[email protected]> * nit Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * rename Signed-off-by: Future-Outlier <[email protected]> * blank Signed-off-by: Future-Outlier <[email protected]> * rerun build docs ci Signed-off-by: Future-Outlier <[email protected]> * update pingsu's advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Kevin Su <[email protected]> * Update pingsu's advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Kevin Su <[email protected]> * deploying agents in the sandbox Signed-off-by: Future-Outlier <[email protected]> * rename Signed-off-by: Future-Outlier <[email protected]> * nit Signed-off-by: Future-Outlier <[email protected]> * Implementing Agent Metadata Service Signed-off-by: Future-Outlier <[email protected]> * reorganize and copyedit new content Signed-off-by: nikki everett <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: nikki everett <[email protected]> Co-authored-by: Kevin Su <[email protected]> Co-authored-by: nikki everett <[email protected]>
Signed-off-by: Ketan Umare <[email protected]> Co-authored-by: Ketan Umare <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]>
Signed-off-by: Fabio M. Graetz, Ph.D. <[email protected]> Signed-off-by: Fabio Grätz <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]> Co-authored-by: Kevin Su <[email protected]>
* add cache client read and write otel tracing Signed-off-by: Paul Dittamo <[email protected]> * lint Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
…core` deployment (#5208) Signed-off-by: Future-Outlier <[email protected]>
* Add reference to prom operator install guide Signed-off-by: davidmirror-ops <[email protected]> * Adds info about the three base dashboards Signed-off-by: davidmirror-ops <[email protected]> * Adds instructions to enable SMs Signed-off-by: davidmirror-ops <[email protected]> * Incorporate reviews Signed-off-by: davidmirror-ops <[email protected]> * Minor fixes Signed-off-by: davidmirror-ops <[email protected]> * Improve format for steps Signed-off-by: davidmirror-ops <[email protected]> --------- Signed-off-by: davidmirror-ops <[email protected]>
* chore: remove obsolete flyte config files * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
* enable parallelism to be set to nil for array node Signed-off-by: Paul Dittamo <[email protected]> * unit test Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: [email protected] <[email protected]>
… template (#5215) * update with link to dockerfile template Signed-off-by: nikki everett <[email protected]> * fix 404 error Signed-off-by: nikki everett <[email protected]> --------- Signed-off-by: nikki everett <[email protected]>
* copy changes over from flytesnacks#1553 Signed-off-by: nikki everett <[email protected]> * fix formatting Signed-off-by: nikki everett <[email protected]> * fix 404 error Signed-off-by: nikki everett <[email protected]> --------- Signed-off-by: nikki everett <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]> Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
* Refactor panic handling to middleware Signed-off-by: Jason Parraga <[email protected]> * Remove registration of old panicCounter Signed-off-by: Jason Parraga <[email protected]> * Add test coverage Signed-off-by: Jason Parraga <[email protected]> --------- Signed-off-by: Jason Parraga <[email protected]>
* TEST build Signed-off-by: Future-Outlier <[email protected]> * remove emphasize-lines Signed-off-by: Future-Outlier <[email protected]> * test build Signed-off-by: Future-Outlier <[email protected]> * revert Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]>
* FlytePropeller Compiler Avoid Crash when Type not found Signed-off-by: Future-Outlier <[email protected]> * Update pingsu's error message advices Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]> * fix lint Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]>
* first version Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: ddl-rliu <[email protected]>
* add send arg Signed-off-by: Yee Hing Tong <[email protected]> * Add acction to remove cache in gh runner Signed-off-by: Eduardo Apolinario <[email protected]> * Use correct checked out path Signed-off-by: Eduardo Apolinario <[email protected]> * Path in strings Signed-off-by: Eduardo Apolinario <[email protected]> * Checkout repo in root Signed-off-by: Eduardo Apolinario <[email protected]> * Use the correct path to new action Signed-off-by: Eduardo Apolinario <[email protected]> * Do not use gh var in path to clear-action-cache Signed-off-by: Eduardo Apolinario <[email protected]> * Remove wrong invocation of clear-action-cache Signed-off-by: Eduardo Apolinario <[email protected]> * GITHUB_WORKSPACE is implicit in the checkout action Signed-off-by: Eduardo Apolinario <[email protected]> * Refer to local `flyte` directory Signed-off-by: Eduardo Apolinario <[email protected]> --------- Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Andrew Dye <[email protected]>
* Make flyteidl releases go through a manual gh workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Make flytectl releases go through a manual gh workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Rewrite the documentation for `version` and clarify wording in RELEASE.md Signed-off-by: Eduardo Apolinario <[email protected]> --------- Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
* fix CHANGELOG-v0.2.0.md Signed-off-by: Christina <[email protected]> * fix CHANGELOG-v1.0.2-b1.md Signed-off-by: Christina <[email protected]> * fix CHANGELOG-v1.1.0.md Signed-off-by: Christina <[email protected]> * fix CHANGELOG-v1.3.0.md Signed-off-by: Christina <[email protected]> --------- Signed-off-by: Christina <[email protected]>
* Fetch all tags in flyteidl-release.yml Signed-off-by: Eduardo Apolinario <[email protected]> * Fix sed expression for npm job Signed-off-by: Eduardo Apolinario <[email protected]> --------- Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
* update Signed-off-by: Desi Hsu <[email protected]> * dco Signed-off-by: Desi Hsu <[email protected]> * dco Signed-off-by: Desi Hsu <[email protected]> * typo Signed-off-by: Desi Hsu <[email protected]> --------- Signed-off-by: Desi Hsu <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
#5648) Signed-off-by: Katrina Rogan <[email protected]>
#5649) * Don't error when attempting to trigger schedules for inactive projects Signed-off-by: Katrina Rogan <[email protected]> * regen Signed-off-by: Katrina Rogan <[email protected]> --------- Signed-off-by: Katrina Rogan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates!
// If this literal is offloaded, this field will contain metadata including the offload location. | ||
string uri = 6; | ||
// Includes information about the size of the literal. | ||
uint64 size_bytes = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is size important again? I mean there's other metadata as well (etag information). The assumption here is that size is super important so we want to be able to show that without making a head call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general consensus seemed to be this is useful, I guess it's nice for clients who want to decide whether to pull massive datasets?
|
||
When writing outputs in the [remote_file_output_writer](https://github.com/flyteorg/flyte/blob/2ca31119d6b9258661a71f38e450f93b6692402c/flyteplugins/go/tasks/pluginmachinery/ioutils/remote_file_output_writer.go#L56-L84) the source code should detect whether the literal size exceeds the configured minimum and | ||
- if the task is using a newer SDK version that supports reading offloaded literals, offload the literal to the configured storage backend and update the literal with the offload URI and size. | ||
- if the task is using an older SDK version that doesn't support offloaded literals, fail the task with an error message indicating that the task output is too large and the user should update their SDK version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a line to say downstream tasks also have to upgraded? that is, if you have a reference/remote task downstream that consumes the map task output, but it hasn't been updated, then it'll fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
|
||
For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated? | ||
As a follow-up, we can also implement literal offloading in the SDK for conventional python tasks. Flytekit should also know how to offload the data. This should be done transparently to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we cover that here? or is this only for map tasks for now? for the general case, we were going to go with the solution of propeller setting an environment variable that turns on offloading on the flytekit side?
for erroring if things get too big, i don't know that there's a solution. We should just add a size limit asap in flytekit right @eapolinario? Some env var based setting with a 10MB default. If a literal is more than 10MBs then error. considering we don't know when we'll get to the general case, by the time we do, most users might've already upgraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're leaning towards not tackling the implementation bits for this proposal but I think it's okay to cover future work here?
Updated to include the bits about failing fast here for too-large literals, thank you!
thanks for the review @wild-endeavor, updated - mind taking another look? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## rfc/offloaded-literal #5653 +/- ##
==========================================================
- Coverage 58.99% 36.17% -22.83%
==========================================================
Files 645 1302 +657
Lines 55670 109484 +53814
==========================================================
+ Hits 32844 39606 +6762
- Misses 20230 65740 +45510
- Partials 2596 4138 +1542
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
What changes were proposed in this pull request?
Updates draft RFC here: #5103
How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link