Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create desc index #98

Open
peterdudfield opened this issue Jun 16, 2023 · 15 comments
Open

Create desc index #98

peterdudfield opened this issue Jun 16, 2023 · 15 comments

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Jun 16, 2023

It might be sueful to add some index which desc in time. This is becasue we load the most recent data (often)

I`ve tried adding

create index ix_generation_start_utc_desc 
on generation (start_utc desc)

It took 24 seconds to make

@peterdudfield
Copy link
Contributor Author

Above didnt really help

I also tried
create index ix_forecasts_timestamp_utc_desc on forecasts (timestamp_utc desc)

but it didnt really help

@peterdudfield
Copy link
Contributor Author

Tyring create index ix_forecast_values_start_utc_horizon_minutes on forecast_values (start_utc, horizon_minutes)

It took 1.26 minutes to make

@peterdudfield
Copy link
Contributor Author

Want to try cliustering on horizon_minutes

on forecast_values (horizon_minutes)

ALTER TABLE forecast_values
CLUSTER ON ix_forecast_values_horizon_minutes;```

This took 1.11 seconds to make

@simlmx
Copy link
Collaborator

simlmx commented Jun 16, 2023

Can you share the query you are trying to optimize?

@peterdudfield
Copy link
Contributor Author

The forecast past one

@peterdudfield
Copy link
Contributor Author

I thought we could partition the table into horizon_minutes=0 and horizon_minutes>0

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Jun 16, 2023

SELECT DISTINCT ON (forecasts.site_uuid, forecasts.timestamp_utc) *
FROM forecasts JOIN forecast_values ON forecasts.forecast_uuid = forecast_values.forecast_uuid
WHERE forecasts.site_uuid IN ('04d67e7f-7bcf-4b55-8ced-14d311e0a5f5', 
							  '0bd3dc11-8160-46a0-9ff7-36e125750aeb') 
AND forecasts.timestamp_utc >= '2023-06-15'
AND forecasts.timestamp_utc < '2023-06-16 15:45'
AND forecast_values.horizon_minutes = 0
AND forecast_values.start_utc >= '2023-06-15'
AND forecast_values.start_utc < '2023-06-16 15:45'
ORDER BY forecasts.site_uuid, forecasts.timestamp_utc

@simlmx
Copy link
Collaborator

simlmx commented Jun 16, 2023

You mean the 2 queries called when this is called?

def get_forecasts_by_sites(

If so I guess the query on past forecasts is the slow one? It's worth noting that the data returned by this query never changes (because it's in the past). As time goes by the only thing that changes is that the latest forecast (which is faster to query, that's the second query in our 2 queries).

My point is that instead of calling both queries systematically, we could call the first query once (slow but can be done async and that's typically not what the user is looking for) and then update it with the most latest forecast (which is fast) as they become available. We could even keep the data in the browser cache so that if they refresh it's already there. And if they close their window, come back 30 minutes later, we could just query the missing bit, which would be faster.

I'm not sure it's an easy problem to fix otherwise.

@peterdudfield
Copy link
Contributor Author

I agree, they might not want that immediately,

I'm going to try some partitioning, and see if it helps

@simlmx
Copy link
Collaborator

simlmx commented Jun 16, 2023

Cool! It could be a good idea to make a separate toy database filled with a lot of (random) data and do tests there, to remove possible external factors, and probably iterate faster.

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Jun 16, 2023

Partition from table could be

-- drop table forecast_values_new2;

CREATE TABLE forecast_values_temp
(like forecast_values including all)

alter table forecast_values_temp
    drop constraint forecast_values_temp_pkey,
    add primary key (forecast_value_uuid, horizon_minutes);
	
create table forecast_values_new 
(like forecast_values_temp including all)
partition by range (horizon_minutes);

CREATE TABLE forecast_values_0 PARTITION OF forecast_values_new
    FOR VALUES FROM (0) TO (1);
	
CREATE TABLE forecast_values_g0 PARTITION OF forecast_values_new
    FOR VALUES FROM (1) TO (10000000);

-- This can take some time > 10
INSERT INTO forecast_values_new
SELECT * FROM forecast_values

ALTER TABLE forecast_values RENAME TO forecast_values_old;
ALTER TABLE forecast_values_new RENAME TO forecast_values;

And then do it in the datamodel

I want to try

alter table forecast_values
    drop constraint forecast_values_pkey,
    add primary key (forecast_value_uuid, horizon_minutes);

first

@simlmx
Copy link
Collaborator

simlmx commented Jun 16, 2023

It sounds dangerous to try those in the Dev/Prod databases. You could end up in a state that alembic doesn't recognize by mistake no?

@peterdudfield
Copy link
Contributor Author

alter table forecast_values
    drop constraint forecast_values_pkey,
    add primary key (forecast_value_uuid, horizon_minutes);

This didnt work, Ill role this back.

I'm doing it in 'dev', the above meanes I can always role back, never deleting anything

@peterdudfield
Copy link
Contributor Author

I made a parititon table and done the renaming, will have to leave it over the weekend to see if it speeds things up or not.

@peterdudfield
Copy link
Contributor Author

I rolled the partition back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants