Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data Chunks to the Appender #139

Merged
merged 7 commits into from
Dec 23, 2023

Conversation

maiadegraaf
Copy link
Contributor

This PR alters the appender to use duckdb::data_chunks instead of the DuckDB appender.

The function AppendRow() is used the same as before, except instead of row-wise insertions, the rows are inserted into data_chunks, allowing us to append whole chunks at a time. When a chunk is full, a new one is automatically allocated. When Flush() is called, all the data_chunks are appended and destroyed in one go.

Currently appender_test.go takes the same time as before (0.12s). However, I plan to tweak and optimize this function to increase its efficiency.

Finally, a big benefit of transitioning to the data_chunk framework, is that it allows for the easy addition of nested types which will be coming in a future PR.

Relevant issue: #135
CC: @taniabogatsch

@marcboeker
Copy link
Owner

@maiadegraaf Thanks for migrating the appender to data chunks.

@marcboeker marcboeker merged commit 1095769 into marcboeker:master Dec 23, 2023
2 checks passed
@killzoner
Copy link

killzoner commented Dec 27, 2023

Hey @maiadegraaf , turns out this PR probably introduces a breaking change for nested types (which were kind of already supported)

Until v1.5.5, I was able to insert TEXT[] data by using something like [value1, value2] as a string, but now the insertion throws Type mismatch in Append DataChunk and the types required for appender.

I guess implicit conversion is not done anymore but would still work under less restrictions from appendRowArray

@marcboeker
Copy link
Owner

@killzoner Could you please post the code you are using to insert TEXT[]. Thanks!

levakin pushed a commit to levakin/go-duckdb that referenced this pull request Jan 9, 2024
* works for single column

* works for chunk

* add row at a time to chunk, only works upto one chunk

* Store chunks in appender struct, and append in flush.

* Chunks Work!

* Add destroy data_chunk

* Some minor refactorings

---------

Co-authored-by: Marc Boeker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants