You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now we this ENUM type to efficiently store information about how we arrived at a greencheck in our variable greencheck tables, and the DDL for it looks like so:
enum('url','whois','ip','none','as')
Ideally, we'd have a way to represent checks made using the new carbon.txt approach, and also represent that by adding a new value in our ENUM table, so we would have something like this:
enum('url','whois','ip','none','as')
We can make a migration to add this, but before we do, it's important to understand the implications of doing so, and pitfalls
Order matters
First of all, the order, really, really matters. If we didn't follow the exact same ordering, and did this
# BAD! DO NOT DO THIS!
enum('url','whois','as','none','ip', 'carbontxt')
Then it would have the unfortunate effect of making all checks using an AS show as an IP check, and all IP checks show as an AS. This is bad.
Sadly, the SQL generated by default from our django migrations looks like this:
ALTERTABLE`greencheck_2021` MODIFY `type` enum('as','ip','none','url','whois','carbontxt') NOT NULL;
It follows the source order of the code, so there's potential for data loss if we blindly run a migration
Updates for ENUMs are blocking (?)
From what I can see, making an update to a table is a blocking operation. In a test on a smaller table, we just updating one with a few million rows took about a minute, so I'm dreading how long it would take to update a table with a few hundred million rows now by comparison.
I've run a test locally, and it doesn't look like this triggers a rebuild for a relatively large table in the dev environment.
carbon.txt is based around using DNS, and I'm still a bit wary of bundling this change with the other work, and after a some more checks it looks like we stopped using the 'whois' value for checks aaaaaages ago - there are checks in the big tables of type 'whois' since 2020.
So, I'm now wondering if a way to decouple this work here from the big table migration is to use what's already there, and make any checks based around carbon.txt use the existing whois ENUM label, until we can run this migration separately.
In that migration we can update all the checks marked as whois after 2020 to use the new carbontxt value, which would get us back on steady footing again.
The text was updated successfully, but these errors were encountered:
mrchrisadams
changed the title
Figure out a sensible way to mark entires as entered based on carbon.txt
Figure out a sensible way to mark entries as entered based on carbon.txt
Sep 27, 2021
Right now we this ENUM type to efficiently store information about how we arrived at a greencheck in our variable greencheck tables, and the DDL for it looks like so:
Ideally, we'd have a way to represent checks made using the new carbon.txt approach, and also represent that by adding a new value in our ENUM table, so we would have something like this:
We can make a migration to add this, but before we do, it's important to understand the implications of doing so, and pitfalls
Order matters
First of all, the order, really, really matters. If we didn't follow the exact same ordering, and did this
Then it would have the unfortunate effect of making all checks using an AS show as an IP check, and all IP checks show as an AS. This is bad.
Sadly, the SQL generated by default from our django migrations looks like this:
It follows the source order of the code, so there's potential for data loss if we blindly run a migration
Updates for ENUMs are blocking (?)
From what I can see, making an update to a table is a blocking operation. In a test on a smaller table, we just updating one with a few million rows took about a minute, so I'm dreading how long it would take to update a table with a few hundred million rows now by comparison.
https://www.tutorialspoint.com/add-a-new-value-to-a-column-of-data-type-enum-in-mysql
https://dba.stackexchange.com/questions/11484/is-it-possible-to-change-enum-lists
https://mariadb.com/kb/en/enum/
https://mariadb.com/docs/reference/mdb/data-types/ENUM/
This ticket here suggests that adding it might not trigger a rebuild, as of Maria DB 10.0.32:
https://jira.mariadb.org/browse/MDEV-13175
Update
I've run a test locally, and it doesn't look like this triggers a rebuild for a relatively large table in the dev environment.
carbon.txt is based around using DNS, and I'm still a bit wary of bundling this change with the other work, and after a some more checks it looks like we stopped using the 'whois' value for checks aaaaaages ago - there are checks in the big tables of type 'whois' since 2020.
So, I'm now wondering if a way to decouple this work here from the big table migration is to use what's already there, and make any checks based around carbon.txt use the existing
whois
ENUM label, until we can run this migration separately.In that migration we can update all the checks marked as
whois
after 2020 to use the newcarbontxt
value, which would get us back on steady footing again.The text was updated successfully, but these errors were encountered: