-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: ADR for results storage with discrete metrics #2304
base: main
Are you sure you want to change the base?
Conversation
* **+** Low complexity of logic | ||
* **-** This breaks backwards compatibility with the current tables schemas (mitigated by view schema remaining the same) | ||
* **-** Redundancy in table for repeated client info columns | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe even this will basically be billed as if the entire table is recreated: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea you're right, at least for the case where we need to delete rows. This should only apply to rerun scenarios though, at least. In normal execution, we wouldn't need to delete any records, so it should only process the new data:
q = The sum of bytes processed by the DML statement itself, including any columns referenced in tables scanned by the DML statement.
t = The sum of bytes for all columns in the table being updated by the DML statement, as they are at the time the query starts. All columns are included, regardless of whether those columns are referenced in or modified by the DML statement.
Bytes processed
If there are only INSERT clauses:q
.
If there is an UPDATE or DELETE clause:q + t
.
I can clarify this in the pros/cons.
### 3. Table per Metric | ||
|
||
* This option is almost identical to the [Row per Metric](#2.-Row-per-Metric) option, however instead of adding rows to an existing table, we will create a new table for each metric | ||
* The `metric_slug` column from Option 2 is not necessary, so we can retain the current column-named-with-metric-slug, but each table will only ever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there something missing here
* **-** Added cost (BigQuery treats each MERGE as basically a full delete/recreate of the table) | ||
|
||
|
||
### 2. Row per Metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't "love" this option, but of the ones we came up so far I think this might be the best one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I feel the same way. I guess the question is whether we're happy enough with this to proceed, or if we dislike them all enough to go back to the drawing board. I'd vote move forwards with this approach obviously, but open to discussion.
This lays out the options for how to deal with the new discrete metric execution in terms of storing the results.
I am leaning towards Option 2 because I think it will be the most straightforward, with the main complexity lying in creating the views to match the current structure. Open to discussion though!