Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying Hive partitioning parquets is slow #173

Open
2 tasks done
xqe2011 opened this issue Nov 7, 2024 · 1 comment
Open
2 tasks done

Querying Hive partitioning parquets is slow #173

xqe2011 opened this issue Nov 7, 2024 · 1 comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed priority-medium Medium priority issue user-request This issue was directly requested by a user

Comments

@xqe2011
Copy link

xqe2011 commented Nov 7, 2024

What happens?

Recently, we tried this extensions instead of using a standalone duckdb instance. When we run a simple SELECT query on parquet files, it's 2-20 times slower than DuckDB.

Profiling method
SELECT duckdb_execute($$SET enable_profiling='query_tree'$$); and watch logs.

To Reproduce

Query one field : SELECT name FROM public.table1 where code1 = 3261 and code2 = '001' and code3 = '5204' and code4 = '1'
code1 and code2 are partition fields.

DuckDB runs on cli 0.0190s
DuckDB runs in this extension 0.0291s
Total time of using this extension 0.513s
Query multi fields: SELECT name, level, detxlen, detylen, downid FROM public.table1 where code1 = 3261 and code2 = '001' and code3 = '5204' and code4 = '1'
code1 and code2 are partition fields.

DuckDB runs on cli 0.0413s
DuckDB runs in this extension 0.037s
Total time of using this extension 0.552s

OS:

Ubuntu Server 22.04.3

ParadeDB Version:

paradedb/paradedb:16-v0.11.1

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Helm Chart

Full Name:

Liu Qijie

Affiliation:

Dongguan University of Technology

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • Yes, I have
@xqe2011 xqe2011 added the bug Something isn't working label Nov 7, 2024
@philippemnoel philippemnoel added good first issue Good for newcomers help wanted Extra attention is needed priority-medium Medium priority issue user-request This issue was directly requested by a user labels Nov 7, 2024
@philippemnoel
Copy link
Collaborator

Thanks for opening! Would love your help with debugging this, or anyone else if willing to assist here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed priority-medium Medium priority issue user-request This issue was directly requested by a user
Projects
None yet
Development

No branches or pull requests

2 participants