You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Also, for bigquery it would be quite easy to implement an incremental model for this since all the timestamps are there, right? Should I try to submit a PR?
The text was updated successfully, but these errors were encountered:
-- exclude all root ID with more than one page view ID
We interpreted that to mean, "Events associated with multiple different page view IDs are considered noise and should be excluded." Why it's there is a fair question, though: snowplow/snowplow-web-data-model#43
To be honest, that's a place where we've deferred to the Snowplow folks. They've just released a new web model (SQL transformations for Redshift); I took a look and couldn't find this exact logic replicated there, though there were several other steps in which duplicated event IDs + page view IDs are removed entirely.
Also, for bigquery it would be quite easy to implement an incremental model for this since all the timestamps are there, right? Should I try to submit a PR?
If I don't get it wrong, the deduplication implemented here
https://github.com/fishtown-analytics/snowplow/blob/3795d06f365213ca4930d2447bd1580cb7031557/models/page_views/default/snowplow_web_page_context.sql#L43
drops all events that have a duplicated event_id (named root_id there), instead of keeping only the first one of those. Seems strange to me, is there a reason?
Also, for bigquery it would be quite easy to implement an incremental model for this since all the timestamps are there, right? Should I try to submit a PR?
The text was updated successfully, but these errors were encountered: