Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(udf): store WASM UDF in meta store #15269

Merged
merged 3 commits into from
Feb 27, 2024
Merged

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Feb 26, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #14568.

This PR moves WASM UDF binaries from object store to meta store. It removes system parameter wasm_storage_url and adds a field compressed_binary to the function catalog. WASM binaries will be compressed by zstd and then stored in the catalog.

Existing two ways to create WASM functions are still supported:

create function ... language wasm using base64 '<encoded-binary>';
create function ... language wasm using link 'fs://path/to/binary.wasm';

But the behavior of the frontend handling it changed.

syntax old behavior new behavior
using base64 upload binary to object store, store the link to catalog store binary to catalog
using link store the link to catalog download binary from the link, store binary to catalog

Given that Rust & WASM UDF are introduced in v1.7, if this PR can catch up with this version release, it can be considered a non-breaking change.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Feb 27, 2024
Cache::builder()
.time_to_idle(Duration::from_secs(60))
.build()
});

if let Some(runtime) = RUNTIMES.get(link).await {
let md5 = md5::compute(binary);
if let Some(runtime) = RUNTIMES.get(&md5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using get_with here? BTW, is it possible that there's an md5 collision? 🤣

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know get_with before you mentioned it. Looks good to use.
I think the possibility of md5 collision is negligible as this is not a security critical case. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have the udf's identifier? Why bother to use md5? 🤔

Copy link
Contributor Author

@wangrunji0408 wangrunji0408 Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different. The identifier is used for finding functions within a WASM binary, while multiple functions may share the same binary. Here we cache the WASM runtime for each binary, but we don't want to store full binaries in memory, so their md5 is used as the cache key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. But my idea is that the case where multiple functions share the same binary is also rare. Anyway, both OK to me.

@@ -44,6 +44,7 @@ pub struct Model {
pub link: Option<String>,
pub identifier: Option<String>,
pub body: Option<String>,
pub compressed_binary: Option<Vec<u8>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an experience value that how large it can be for a WASM binary? I'm concerned about the possibility of running out of memory as all catalogs are cached in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is about 300KB after compression for our wasm binary in e2e test.

@wangrunji0408 wangrunji0408 removed this pull request from the merge queue due to a manual request Feb 27, 2024
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others generally LGTM

Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Feb 27, 2024
Merged via the queue into main with commit e0f9c68 Feb 27, 2024
29 of 30 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/store-wasm-in-meta branch February 27, 2024 11:09
wangrunji0408 added a commit that referenced this pull request Feb 28, 2024
@TennyZhuang
Copy link
Contributor

What will happen if the user uploads a 100GB wasm file?

@wangrunji0408
Copy link
Contributor Author

What will happen if the user uploads a 100GB wasm file?

OOM 😅. Should set a limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store WASM UDF in meta store
6 participants