High memory usage reading parquet file generated from DuckDB #255

niger-prequel · 2024-07-25T18:45:13Z

Description

We're experiencing unexpectedly high memory usage when reading a parquet file using go-duckdb. The memory usage is orders of magnitude larger than the file being read. The issue arises during the final step where we read a parquet file that was compacted by DuckDB from multiple smaller files. Raised a parallel issue on the main repository because we were able to reproduce this with other clients.

Steps to Reproduce

Please refer to the provided repository which includes a main.go file and the parquet files necessary to reproduce this issue. Clone the repository and follow the README instructions to set up and trigger the problem. We experiencing the high memory utilization on the final step, where we read the Parquet file.

Expected Behavior

Memory usage should be proportional to the size of the parquet file being read, similar to executing the SQL commands directly without involving the DuckDB Golang driver.

Actual Behavior

The memory consumption spikes significantly on both our production Kubernetes cluster and local machine setups, going well beyond the actual size of the parquet file. This high memory usage is specific to when using the go-duckdb driver, as direct SQL execution does not replicate the issue. You can use the pure.sql script and instructions in the README to run a version of this without using the Go driver.

Production Kubernets Memory Monitoring

Memory Usage of Script on OSX

Environment

Go version: 1.21.7
DuckDB version: 1.0.0 and 0.10.0
go-duckdb version: 1.7.0
Operating System: Debian Buster and OSX Sonoma 14.5
Additional Information

The issue persists regardless of the number of threads configured (1 or 2).
We have set several DuckDB configurations and pragmas as part of our initialization process (e.g., memory limits, thread count, etc.).

Impact

This issue is causing significant resource allocation challenges in our production environment, leading to potential service disruptions.

The text was updated successfully, but these errors were encountered:

niger-prequel mentioned this issue Jul 26, 2024

Disproportionately High Memory Usage Reading from a Parquet File duckdb/duckdb#13169

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage reading parquet file generated from DuckDB #255

High memory usage reading parquet file generated from DuckDB #255

niger-prequel commented Jul 25, 2024 •

edited

Loading

High memory usage reading parquet file generated from DuckDB #255

High memory usage reading parquet file generated from DuckDB #255

Comments

niger-prequel commented Jul 25, 2024 • edited Loading

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Impact

niger-prequel commented Jul 25, 2024 •

edited

Loading