-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnest + Group by + Count #1
Comments
Hi, thanks for your note! I ran your code locally and get even worse performance than you show above:
Just to make sure I understand, you want to see how many news items in category 5/7/9 have each tag? For instance:
If that's right, I'm not sure how much a C function is going to help here. On the one hand, I'll think about it a bit, but at first glance I think your best best is a different table structure. (Do you really have only 10 categories?) If you come up with anything in the meantime please let me know! |
Hello!
yes, exactly Of course I have more categories (about 2000), and full text search and other conditions in this query. It looks like faceted search (http://akorotkov.github.io/blog/2016/06/17/faceted-search/), here was an example only :) My first idea was to reduce data while scanning rows instead of unnest+grouping:
But I can't find a way to implement it with PostgreSQL API as an aggregate function. Does structure S must be serialized every row to one value and de-serialized for next row? |
Hello! Example usage: explain analyze
select key,value
from (
SELECT
faceted_count(tags)
FROM public.news
WHERE
category_id in (5,7,9)
) as xx(y), unnest(y[1:1], y[2:2] )
as pair(key,value); Result is more than 50% faster than unnest+group by+count and should use less memory. Could you check if there are no errors? Do you have any suggestions? |
I've made a parallel safe version of faceted_count. It's even faster. set max_parallel_workers_per_gather to 2;
set force_parallel_mode to on;
explain analyze
select key,value
from (
SELECT
faceted_count(tags)
FROM public.news
WHERE
category_id in (5,7,9)
) as xx(y), unnest(y[1:1], y[2:2] )
as pair(key,value); https://github.com/ArturFormella/aggs_for_arrays/blob/master/faceted_count.c Next step is to tweak up hashmap and serialization. |
Thank you for your work on this function! I've only skimmed the code but I think it's a worthwhile addition. I'm trying to decide whether it belongs here or in aggs_for_vecs. It doesn't quite fit in either place, so I think I'll probably keep it here, despite the other functions not being really aggregates. If I come up with a better idea I'll let you know. :-) Anyway, just wanted to let you know that I'm following your progress with interest! If you'd like any help re a test file or documentation I'm happy to add those. |
Hello!
I love this extension!
Maybe you can help me with other issue with arrays. This is the most heavy part of my queries now: unnest + group by + count, let's name it:
count_tags( integer[] ) RETURNS KeyValue[]
There is also one big problem with any optimisation - PostgreSQL prohibit functions which return a set of records.
Example:
The same data counted in NodeJs took 1/100 of this time. Intuition tells me there must be another way.
What do you think about that? Is it possible in C?
Thank you
The text was updated successfully, but these errors were encountered: