Querying Hive partitioning parquets is slow #173
Labels
bug
Something isn't working
good first issue
Good for newcomers
help wanted
Extra attention is needed
priority-medium
Medium priority issue
user-request
This issue was directly requested by a user
What happens?
Recently, we tried this extensions instead of using a standalone duckdb instance. When we run a simple
SELECT
query on parquet files, it's 2-20 times slower than DuckDB.Profiling method
SELECT duckdb_execute($$SET enable_profiling='query_tree'$$);
and watch logs.To Reproduce
Query one field :
SELECT name FROM public.table1 where code1 = 3261 and code2 = '001' and code3 = '5204' and code4 = '1'
code1
andcode2
are partition fields.DuckDB runs on cli 0.0190s
DuckDB runs in this extension 0.0291s
Total time of using this extension 0.513s
Query multi fields:
SELECT name, level, detxlen, detylen, downid FROM public.table1 where code1 = 3261 and code2 = '001' and code3 = '5204' and code4 = '1'
code1
andcode2
are partition fields.DuckDB runs on cli 0.0413s
DuckDB runs in this extension 0.037s
Total time of using this extension 0.552s
OS:
Ubuntu Server 22.04.3
ParadeDB Version:
paradedb/paradedb:16-v0.11.1
Are you using ParadeDB Docker, Helm, or the extension(s) standalone?
ParadeDB Helm Chart
Full Name:
Liu Qijie
Affiliation:
Dongguan University of Technology
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include the code required to reproduce the issue?
Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: