Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database size #14

Open
jemrobinson opened this issue Feb 7, 2022 · 3 comments
Open

Database size #14

jemrobinson opened this issue Feb 7, 2022 · 3 comments

Comments

@jemrobinson
Copy link
Collaborator

jemrobinson commented Feb 7, 2022

Records

In the forecast tables we expect a single record to take:

  • forecast_id (serial4) => 4 bytes
  • data_forecast_generated (date) => 4 bytes
  • data_forecast_for (date) => 4 bytes
  • cell_id (int4) => 4 bytes
  • sea_ice_concentration_mean (float4) => 4 bytes
  • sea_ice_concentration_stddev (float4) => 4 bytes
    Total: 24 bytes per record for data

Disk size

However, from recent measurements:

Number of records Size in bytes
9100000 1032159232
9200000 1040728064

so 100000 records take 8568832 bytes => each record takes 85.68 bytes

  • per record 85.68 bytes
  • overhead 240 MB

Summary

There are around 23M records each day for the northern and southern hemispheres combined.
This means: 23,000,000 * 85.68 / 1024 / 1024 = 1.84 GB per day.

  • each day requires ~1.84 GB of space.
  • each month requires ~56 GB of space
  • each year requires ~675 GB of space
@jemrobinson
Copy link
Collaborator Author

jemrobinson commented Feb 10, 2022

The following (from Slack) might explain why the current number of records per-day is so high.

James Robinson 2022-02-09 21:36

I’ve noticed that the latest predictions show non-zero sea ice (sic_mean > 0) in every cell of the 432 x 432 grid for every date and leadtime. This feels incorrect, as i’m pretty sure that some of that space is land - can you confirm @James Byrne

James Byrne 2022-02-09 22:21

I've not yet been applying the land mask to the outputs, which I do need to do. The predictions in the south are vaguely sensible, the north might be very ropey! Good spot though, I'll sort that out tomorrow! 😉

@jemrobinson
Copy link
Collaborator Author

Recent files show 8286021 records per day for the northern hemisphere and 3094203 for the southern. It looks like there's still an issue with the masking for the northern hemisphere though (see below) so these numbers may come down further.

Screenshot 2022-02-14 at 00 03 45

@jemrobinson
Copy link
Collaborator Author

Since 2022-02-16 the sizes are

hemisphere n_records est size (MB)
north 9070011 741.1
south 14261829 1165.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant