You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're experiencing unexpectedly high memory usage when reading a parquet file using go-duckdb. The memory usage is orders of magnitude larger than the file being read. The issue arises during the final step where we read a parquet file that was compacted by DuckDB from multiple smaller files. Raised a parallel issue on the main repository because we were able to reproduce this with other clients.
Steps to Reproduce
Please refer to the provided repository which includes a main.go file and the parquet files necessary to reproduce this issue. Clone the repository and follow the README instructions to set up and trigger the problem. We experiencing the high memory utilization on the final step, where we read the Parquet file.
Expected Behavior
Memory usage should be proportional to the size of the parquet file being read, similar to executing the SQL commands directly without involving the DuckDB Golang driver.
Actual Behavior
The memory consumption spikes significantly on both our production Kubernetes cluster and local machine setups, going well beyond the actual size of the parquet file. This high memory usage is specific to when using the go-duckdb driver, as direct SQL execution does not replicate the issue. You can use the pure.sql script and instructions in the README to run a version of this without using the Go driver.
Production Kubernets Memory Monitoring
Memory Usage of Script on OSX
Environment
Go version: 1.21.7 DuckDB version: 1.0.0 and 0.10.0 go-duckdb version: 1.7.0 Operating System: Debian Buster and OSX Sonoma 14.5 Additional Information
The issue persists regardless of the number of threads configured (1 or 2).
We have set several DuckDB configurations and pragmas as part of our initialization process (e.g., memory limits, thread count, etc.).
Impact
This issue is causing significant resource allocation challenges in our production environment, leading to potential service disruptions.
The text was updated successfully, but these errors were encountered:
Description
We're experiencing unexpectedly high memory usage when reading a parquet file using go-duckdb. The memory usage is orders of magnitude larger than the file being read. The issue arises during the final step where we read a parquet file that was compacted by DuckDB from multiple smaller files. Raised a parallel issue on the main repository because we were able to reproduce this with other clients.
Steps to Reproduce
Please refer to the provided repository which includes a
main.go
file and the parquet files necessary to reproduce this issue. Clone the repository and follow the README instructions to set up and trigger the problem. We experiencing the high memory utilization on the final step, where we read the Parquet file.Expected Behavior
Memory usage should be proportional to the size of the parquet file being read, similar to executing the SQL commands directly without involving the DuckDB Golang driver.
Actual Behavior
The memory consumption spikes significantly on both our production Kubernetes cluster and local machine setups, going well beyond the actual size of the parquet file. This high memory usage is specific to when using the go-duckdb driver, as direct SQL execution does not replicate the issue. You can use the
pure.sql
script and instructions in the README to run a version of this without using the Go driver.Production Kubernets Memory Monitoring
Memory Usage of Script on OSX
Environment
Go version: 1.21.7
DuckDB version: 1.0.0 and 0.10.0
go-duckdb version: 1.7.0
Operating System: Debian Buster and OSX Sonoma 14.5
Additional Information
Impact
This issue is causing significant resource allocation challenges in our production environment, leading to potential service disruptions.
The text was updated successfully, but these errors were encountered: