Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move many udf implementations from invoke to invoke_batch #13491

Merged
merged 34 commits into from
Nov 26, 2024

Conversation

joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Nov 19, 2024

Which issue does this PR close?

As part of #13238 invoke was replaced with invoke_batch.

Rationale for this change

This will allow the removal of invoke. And maybe if we are quick we can rename invoke_with_args back to invoke.

What changes are included in this PR?

This PR moves function definitions over to use invoke_batch, so invoke can be removed.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate functions labels Nov 19, 2024
@github-actions github-actions bot added the proto Related to proto crate label Nov 19, 2024
@joseph-isaacs joseph-isaacs marked this pull request as ready for review November 21, 2024 13:57
@joseph-isaacs joseph-isaacs marked this pull request as draft November 21, 2024 13:57
Do not yet deprecate `invoke_batch`, add docs to invoke_with_args
# Conflicts:
#	datafusion/expr/src/udf.rs
#	datafusion/functions/src/datetime/to_local_time.rs
#	datafusion/functions/src/utils.rs
#	datafusion/physical-expr/src/scalar_function.rs
@@ -1954,32 +1954,6 @@ The following intervals are supported:
- years
- century

#### Example
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./dev/update_function_docs.sh did this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because you removed the docs (perhaps accidentally) above -- i left a comment

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joseph-isaacs -- this looks like a great step forward

I had some suggestions / additional cleanups

THe only thing I think that needs to be fixed before merging this is reverting the loss of the documentation for date_bin

@@ -546,7 +546,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// to arrays, which will likely be simpler code, but be slower.
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cam probably remove this #allow as well

@@ -141,8 +141,8 @@ fn run_with_string_type<M: Measurement>(
),
|b| {
b.iter(|| {
#[allow(deprecated)] // TODO use invoke_batch
black_box(ltrim.invoke(&args))
#[allow(deprecated)] // TODO use invoke_with_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here I think we can remove the #allow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment below

datafusion/functions/benches/make_date.rs Outdated Show resolved Hide resolved
datafusion/functions/src/datetime/date_bin.rs Outdated Show resolved Hide resolved
@@ -149,7 +149,8 @@ pub mod test {
let return_type = return_type.unwrap();
assert_eq!(return_type, $EXPECTED_DATA_TYPE);

let result = func.invoke_with_args(datafusion_expr::ScalarFunctionArgs{args: $ARGS, number_rows: cardinality, return_type: &return_type});
#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be good to keep calling using invoke_with_args

@@ -170,7 +171,8 @@ pub mod test {
}
else {
// invoke is expected error - cannot use .expect_err() due to Debug not being implemented
match func.invoke_with_args(datafusion_expr::ScalarFunctionArgs{args: $ARGS, number_rows: cardinality, return_type: &return_type.unwrap()}) {
#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here -- invoke_with_args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of tests use this macro and pass in &[], so it quite a big change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense in a follow up

@@ -1954,32 +1954,6 @@ The following intervals are supported:
- years
- century

#### Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because you removed the docs (perhaps accidentally) above -- i left a comment

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joseph-isaacs -- this is looking close to me, though it looks like a few bugs got introduced. I flagged some of them.

I can try to help tomorrow, but I am out of time today

@@ -330,7 +328,7 @@ where

pub struct ScalarFunctionArgs<'a> {
// The evaluated arguments to the function
pub args: &'a [ColumnarValue],
pub args: Vec<ColumnarValue>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -544,7 +548,7 @@ mod tests {
ColumnarValue::Array(timestamps),
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(1), None)),
],
batch_size,
1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect -- the batch size was 6 values, not 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look for all of these, but its easy to miss one, all the others that pass an literal value take only scalars.

@alamb
Copy link
Contributor

alamb commented Nov 24, 2024

Here is what I suggest we do with this PR:

  1. Remove the change for ScalarUDFImpl invoke improvements #13507 (let's do that in a separate PR)
  2. Fix up the regressions introduced (where batch_size --> 1)

Then merge this one in.

If you have more time @joseph-isaacs it would be even better to change this PR so all the functions used ScalarFunctionArgs but I realize that is asking quite a bit

@joseph-isaacs
Copy link
Contributor Author

I would have liked to move over to invoke_with_args, however its quite a bit more of an undertaking. Possibly in the future I can try

@alamb
Copy link
Contributor

alamb commented Nov 25, 2024

I would have liked to move over to invoke_with_args, however its quite a bit more of an undertaking. Possibly in the future I can try

Makes sense

@joseph-isaacs
Copy link
Contributor Author

I think there was a transient failure

@alamb
Copy link
Contributor

alamb commented Nov 25, 2024

I think there was a transient failure

I retriggered the tests

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @joseph-isaacs

@jayzhan211 jayzhan211 merged commit 25cb812 into apache:main Nov 26, 2024
27 checks passed
@jayzhan211
Copy link
Contributor

Thanks @joseph-isaacs @alamb

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

🙏 thank you -- we are making progresss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate documentation Improvements or additions to documentation functions logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants