Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a sensible way to mark entries as entered based on carbon.txt #198

Open
mrchrisadams opened this issue Sep 17, 2021 · 0 comments

Comments

@mrchrisadams
Copy link
Member

mrchrisadams commented Sep 17, 2021

Right now we this ENUM type to efficiently store information about how we arrived at a greencheck in our variable greencheck tables, and the DDL for it looks like so:

enum('url','whois','ip','none','as')

Ideally, we'd have a way to represent checks made using the new carbon.txt approach, and also represent that by adding a new value in our ENUM table, so we would have something like this:

enum('url','whois','ip','none','as')

We can make a migration to add this, but before we do, it's important to understand the implications of doing so, and pitfalls

Order matters

First of all, the order, really, really matters. If we didn't follow the exact same ordering, and did this

# BAD! DO NOT DO THIS!
enum('url','whois','as','none','ip', 'carbontxt')

Then it would have the unfortunate effect of making all checks using an AS show as an IP check, and all IP checks show as an AS. This is bad.

Sadly, the SQL generated by default from our django migrations looks like this:

ALTER TABLE `greencheck_2021` MODIFY `type` enum('as','ip','none','url','whois','carbontxt') NOT NULL;

It follows the source order of the code, so there's potential for data loss if we blindly run a migration

Updates for ENUMs are blocking (?)

From what I can see, making an update to a table is a blocking operation. In a test on a smaller table, we just updating one with a few million rows took about a minute, so I'm dreading how long it would take to update a table with a few hundred million rows now by comparison.

https://www.tutorialspoint.com/add-a-new-value-to-a-column-of-data-type-enum-in-mysql
https://dba.stackexchange.com/questions/11484/is-it-possible-to-change-enum-lists
https://mariadb.com/kb/en/enum/
https://mariadb.com/docs/reference/mdb/data-types/ENUM/

This ticket here suggests that adding it might not trigger a rebuild, as of Maria DB 10.0.32:
https://jira.mariadb.org/browse/MDEV-13175

Update

I've run a test locally, and it doesn't look like this triggers a rebuild for a relatively large table in the dev environment.

carbon.txt is based around using DNS, and I'm still a bit wary of bundling this change with the other work, and after a some more checks it looks like we stopped using the 'whois' value for checks aaaaaages ago - there are checks in the big tables of type 'whois' since 2020.

So, I'm now wondering if a way to decouple this work here from the big table migration is to use what's already there, and make any checks based around carbon.txt use the existing whois ENUM label, until we can run this migration separately.

In that migration we can update all the checks marked as whois after 2020 to use the new carbontxt value, which would get us back on steady footing again.

@mrchrisadams mrchrisadams changed the title Figure out a sensible way to mark entires as entered based on carbon.txt Figure out a sensible way to mark entries as entered based on carbon.txt Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant