You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running basically the same model in parallel to smooth the transition between models, I noticed that building the models from a fct_ga4__event_page_view model that I created was 40 times more efficient than from stg_ga4__event_page_view.
We build from stg_ga4__event_page_view in a number of places which are all opportunities for a performance improvement. This doesn't take into account caching, but I haven't noticed much of a cache effect from model to model (cache clearly helps when re-using fields within a single model).
However, I only have anecdotal evidence of the little cache re-use between models, so I'd like to see evidence in either respect.
I see us having two options here:
We make fct_ga4__event_page_view a core package model
We conditionally build from fct_ga4__event_page_view if it is present; otherwise falling back to stg_ga4__event_page_view
The first option seems simpler. It has the advantage that enabling and disabling package models is a pattern that we've used elsewhere, so it's not a big leap to expect users to disable the fct_ga4__event_page_view model. However, I find myself customizing this model a lot for each client so we'd be creating a model that almost always gets disabled.
The second option requires greater complexity in the package models. I suspect many people will create fct_ga4__event_page_view models that don't error without consulting any documentation and the errors messages should make it fairly clear what is missing from any fct_ga4__event_page_view model created by users.
Is this something that we should pursue?
Is caching more effective than I give it credit?
Do you have any preferences for either of these methods?
The text was updated successfully, but these errors were encountered:
I think that our perspective on a lot of these things will change when materialized view support finally lands in dbt-bigquery (currently in the experimental plugin, but now being looked at for the core support plugin).
Right now we are kind of hamstrung by the half-implemented idea of supporting streaming data, right?
@dgitis sorry, I'm not following. Can you elaborate on your design of fct_ga4__event_page_view? Is it just a table materialization of stg_ga4__event_page_view?
And what was the SQL you were running where you saw an improvement? Without seeing that, it's hard to tell whether the improvement was due to the SQL itself (use of select *, etc) vs. the source of data.
While running basically the same model in parallel to smooth the transition between models, I noticed that building the models from a
fct_ga4__event_page_view
model that I created was 40 times more efficient than fromstg_ga4__event_page_view
.We build from
stg_ga4__event_page_view
in a number of places which are all opportunities for a performance improvement. This doesn't take into account caching, but I haven't noticed much of a cache effect from model to model (cache clearly helps when re-using fields within a single model).However, I only have anecdotal evidence of the little cache re-use between models, so I'd like to see evidence in either respect.
I see us having two options here:
fct_ga4__event_page_view
a core package modelfct_ga4__event_page_view
if it is present; otherwise falling back tostg_ga4__event_page_view
The first option seems simpler. It has the advantage that enabling and disabling package models is a pattern that we've used elsewhere, so it's not a big leap to expect users to disable the
fct_ga4__event_page_view
model. However, I find myself customizing this model a lot for each client so we'd be creating a model that almost always gets disabled.The second option requires greater complexity in the package models. I suspect many people will create
fct_ga4__event_page_view
models that don't error without consulting any documentation and the errors messages should make it fairly clear what is missing from anyfct_ga4__event_page_view
model created by users.Is this something that we should pursue?
Is caching more effective than I give it credit?
Do you have any preferences for either of these methods?
The text was updated successfully, but these errors were encountered: