Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR for results storage with discrete metrics #2304

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mikewilli
Copy link
Contributor

This lays out the options for how to deal with the new discrete metric execution in terms of storing the results.

I am leaning towards Option 2 because I think it will be the most straightforward, with the main complexity lying in creating the views to match the current structure. Open to discussion though!

* **+** Low complexity of logic
* **-** This breaks backwards compatibility with the current tables schemas (mitigated by view schema remaining the same)
* **-** Redundancy in table for repeated client info columns

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe even this will basically be billed as if the entire table is recreated: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you're right, at least for the case where we need to delete rows. This should only apply to rerun scenarios though, at least. In normal execution, we wouldn't need to delete any records, so it should only process the new data:

q = The sum of bytes processed by the DML statement itself, including any columns referenced in tables scanned by the DML statement.
t = The sum of bytes for all columns in the table being updated by the DML statement, as they are at the time the query starts. All columns are included, regardless of whether those columns are referenced in or modified by the DML statement.
Bytes processed
If there are only INSERT clauses: q.
If there is an UPDATE or DELETE clause: q + t.

I can clarify this in the pros/cons.

### 3. Table per Metric

* This option is almost identical to the [Row per Metric](#2.-Row-per-Metric) option, however instead of adding rows to an existing table, we will create a new table for each metric
* The `metric_slug` column from Option 2 is not necessary, so we can retain the current column-named-with-metric-slug, but each table will only ever
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there something missing here

* **-** Added cost (BigQuery treats each MERGE as basically a full delete/recreate of the table)


### 2. Row per Metric
Copy link
Collaborator

@scholtzan scholtzan Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't "love" this option, but of the ones we came up so far I think this might be the best one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I feel the same way. I guess the question is whether we're happy enough with this to proceed, or if we dislike them all enough to go back to the drawing board. I'd vote move forwards with this approach obviously, but open to discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants