Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The green_urls datasets are empty since 2024-04-16 #574

Closed
BR0kEN- opened this issue Apr 19, 2024 · 5 comments
Closed

[BUG] The green_urls datasets are empty since 2024-04-16 #574

BR0kEN- opened this issue Apr 19, 2024 · 5 comments
Labels
bug Something isn't working Severity: high Bug severity level Severity: low Bug severity level Severity: medium Bug severity level

Comments

@BR0kEN-
Copy link
Contributor

BR0kEN- commented Apr 19, 2024

The green_urls datasets are empty starting from 2024-04-16.

Steps to reproduce the behavior:

  1. Go to https://admin.thegreenwebfoundation.org/admin/green-urls
  2. Download green_urls_2024-04-16.db.gz, green_urls_2024-04-17.db.gz, or green_urls_2024-04-18.db.gz.
  3. Check the file size and attempt to unpack the archive.
  4. The result is the empty SQLite database.

Expected behavior
The database should not be empty.

Actual behavior
The database is empty.

@BR0kEN- BR0kEN- added bug Something isn't working Severity: high Bug severity level Severity: low Bug severity level Severity: medium Bug severity level labels Apr 19, 2024
@mrchrisadams
Copy link
Member

hi @br0ken - thanks, for the heads up. I'll look into this and update when I have more

@br0ken-the-streamer
Copy link

? I have absolutely no idea what this is about. I do not know anything about what you sent. This is not me. You got the wrong Br0ken. This is the correct person https://github.com/BR0kEN-
I am br0ken-the-streamer

@BR0kEN-
Copy link
Contributor Author

BR0kEN- commented Apr 22, 2024

Thanks @mrchrisadams. Just FYI, this is still the case for the snapshots that appear after green_urls_2024-04-18.db.gz - they're all empty (45 bytes in size).

@BR0kEN-
Copy link
Contributor Author

BR0kEN- commented May 27, 2024

@mrchrisadams would you please have a look?

@mrchrisadams
Copy link
Member

hi @BR0kEN- sorry about this, I've looked into it, and I think I see the issue now.

I've pushed a change to the cronjob that was running each morning, and run the job now to generate the snapshots again as intended. There should be snapshot for green_urls_2024-05-27.db.gz accessible now, from me running the export code.

If it helps clarify things, this was the playbook run, to set up the corrected cronjobs on the relevant machine:
https://github.com/thegreenwebfoundation/admin-portal/blob/master/ansible/setup_cronjobs.yml

There should be another one tomorrow, of the expected size, and so on.

However, it's it's a bit of a faff to backfill the other daily snapshots from mid April, and there is some other work on the project that needs to take precedence, so I won't be able to backfill these for a while.

We do store all the day's greenchecks in parquet files, which are optimised for aggregate queries - these lend themselves well to recreating all the green domains for a given day, so it's doable, just not in the next week or so.

I've created #592 to track it, and I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Severity: high Bug severity level Severity: low Bug severity level Severity: medium Bug severity level
Projects
None yet
Development

No branches or pull requests

3 participants