-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API draft to create and scan data chunks #219
Conversation
One more thought is that we could keep the current slower row-at-a-time chunk creation and only consider performance when adding the columnar chunk creation. For the columnar chunk creation, we'll only have to infer the type once and can provide dedicated functions for specific (common) types. Even the appender could be extended with a function |
Thanks. This is definitely in the right direction. Adding some notes from my perspective: I have a rough extension of I found myself implementing a few customised functions ( Features I like to be supported later:
|
I think this looks good, although I would prefer to have a different initialisation part, as there is currently appender-centric. Just renaming Also there is no public API yet right? I think that is the most important part. But this is going in the right direction for sure, I have no real comments at the moment, looks good! There are still some open problems/questions at the moment, I think mostly the public API. What I would have something like the following as API:
and other generic functions that cant sadly be methods
Do you gave any issues/additions with this API? I think it covers most use-cases. I am a little sad about that methods can't have type parameters at this moment, but it is what it is. Another question would be whether or not we want to support projected columns at the datachunk level. I think there is a benefit to doing this. For table functions this would be very useful to transparently implement |
Yes, I am conflicted between the special-casing for each type vs. the performance gain. I am not fully sure how Go performs if type casting happens on a columnar level. The current per-row casts could definitely be sped up by moving to more columnar approaches.
Great suggestion!
Yes, I was curious which functions to expose. And you gave the answer haha, I've added some of them, and will add the remaining ones in different commits.
No, looks great. I am very sad about that, hahah. But yes, we work with what we have. :) |
Good point. We can leave out the generic functions for vector operations until further someone looks into this. I think the bottleneck would still be getting the data pointer from C, but this can be optimized away since it doesn't (really) change. Once this optimization is done we can always look into it and potentially add generic functions. |
(generic and/or specialized, applies to both) |
# Conflicts: # rows.go # types.go
…s into the DataChunk API - will probably have to disentangle this into multiple smaller PRs
# Conflicts: # appender_vector.go # rows.go
What is the progress on this? The merges from main make it difficult to see what changed. I feel like the UDF additions have stalled a bit and are starting to drag (might be partially my fault) |
Hi @JAicewizard, No worries. I can update you on my/our current plan/roadmap for the
So, I expect it will still take a few weeks until the (table) UDFs are merged, but ideally, we'll have a unified way for handling data chunks in place and ways to speed up performance. |
After thinking about some of the discussions in #201, I've decided to draft a PR for creating and scanning data chunks in the go-duckdb driver. Maybe most importantly, we should decide on how the
dataChunk
andvector
look, and what functions we want for them. Ideally, all data chunk and vector operations should go through functions in the new filesdata_chunk.go
andvector.go
.Maybe I went in a completely wrong direction here, what do you think @ajzo90 @JAicewizard?
Questions
Appender
are the function calls via the callbacks and the type casting. @JAicewizard made changes to the vector functions inappender_vector.go
to avoid type casting overhead. How could this be ported to this PR to improve efficiency without breaking theAppender
? See comment here.any
in theAppender
.FIXMEs and future work
AppendColumn
or similar to append values in a columnar approach. See @ajzo90's usage here.Here is also a comment w.r.t. adding this.
Other comments
This sounds very similar to some of the type inference in the
Appender
.I only found this comment. Maybe you could point me to the scanner type?
More future TODOs