-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Scalar UDF support #222
Conversation
I like the idea, however I think it might be difficult to handle composite types. Handling these would require a parser for to go from |
On a seperate note, I like how few methods you need to implement, I think the tablefunctions are very complex to implement. This is kind of inherent to the type of function, but I still dislike it |
Yes, your PR was a great help to write this! And I noticed the same, there is significantly more logic around the table UDFs.
The parser is a fair argument against this. We currently have
This is independent of which strategy we decide on for the types, no? We can return an error when registering the UDF, if we detect invalid types. |
true, but for, for example, table UDFs, the types returned types are only determined at bind, it would turn into a generic sql error. But this is definitely debatable, its just my opinion. A seperate PR could be appropriate, I don't have much time due to exams next week. Feel free to copy my implementation if you want, also keep in mind we can add new "constructors" whenever we want, if some duckdb API comes out to parse a string into a type we can do this. (we can even make type an interface) |
# Conflicts: # Makefile # appender.go # data_chunk.go # deps/darwin_amd64/libduckdb.a # deps/darwin_arm64/libduckdb.a # deps/freebsd_amd64/libduckdb.a # deps/linux_amd64/libduckdb.a # deps/linux_arm64/libduckdb.a # errors.go # types.go # vector.go
# Conflicts: # Makefile # deps/darwin_amd64/libduckdb.a # deps/darwin_arm64/libduckdb.a # deps/freebsd_amd64/libduckdb.a # deps/linux_amd64/libduckdb.a # deps/linux_arm64/libduckdb.a # errors.go # types.go
@JAicewizard, somehow this PR broke along the way. I also tried pinning (with |
Never mind, it is most likely a bug introduced somewhere in the C API, as reverting to an older duckdb build (seems) to fix it. I.e., the same code works with |
Haha Yeah debugging these kind of issues it not fun. Nice that it is fixed. I will look for similar changes in duckdb for the table UDFs when rebasing my branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice, it looks a bit messy with all the type_info changes in here as well, but I think this looks good to be merged if the comments are addressed.
Optionally you could add parallel and chunk APIs as well, but I don't think it has any unfixable implications besides what I already mentioned.
Thanks for your review!
I am unsure what you mean with the parallel API, I need to check with your table UDF PR again. |
I also merged the type interface changes in |
I cannot find the scalar function API online at all, so I can't check ATM, but table functions can specify their max threads that they can execute on, and executing on more than one thread of course requires being aware of this and handling local vs global state (thus I implemented them as different types). I don't know if something similar exists for scalar functions
Ah yeah I see the problem, I currently use |
Yes, I achieved speed-ups on this with #254. |
Ah yes, I've been working with |
The table function documentation is also broken ATM, I will open an issue about that too. But I just realised that scalar functions probably don't need any state at all, so there probably is no max threads for scalar functions. |
Duckdb internally parallelizes scalar function execution, and each chunk is independently executed (on different threads, if available). So, indeed, we do not need states here. I went over your feedback and pushed some changes.
Could you give another review (after these remaining steps)? |
Alright, I've implemented the review feedback (thanks!), and from my side, this should be ready to go in. |
c7b599c
to
172a22b
Compare
This is a draft PR. I built with duckdb's feature branch, as scalar UDFs are not yet part of the released C API. The test failures are related to that.
Because of that, and because this PR depends on #219, there are a lot of file changes in this PR, as well as separate libraries. They can all be ignored. The relevant files are listed below.
scalar_udf.go
scalar_udf_test.go
@JAicewizard, I solved passing types differently than you did in #201. What do you think? Is passing SQL type names a sensitive idea? I also tried to solely use the
DataChunk
API draft's functions.In later PRs, we can extend this with
ExecuteColumn
.Here is the example included in this PR.