This repository has been archived.
As a result all of its historical issues and PRs have been closed.
Please do not clone this repo without understanding the risk in doing so:
- It may have unaddressed security vulnerabilities
- It may have unaddressed bugs
Click for historical readme
⛔🏚️ This package is no longer developed or maintained by dbt Labs. A fork is maintained at https://github.com/fleetio/dbt-segment
This dbt package:
- Performs "user stitching" to tie all events associated with a cookie to the same user_id
- Transforms pageviews into sessions ("sessionization")
New to dbt packages? Read more about them here.
- Include this package in your
packages.yml
— check here for the latest version number. - Run
dbt deps
- Include the following in your
dbt_project.yml
directly within yourvars:
block (making sure to handle indenting appropriately). Update the value to point to your segment page views table.
# dbt_project.yml
config-version: 2
...
vars:
segment:
segment_page_views_table: "{{ source('segment', 'pages') }}"
This package assumes that your data is in a structure similar to the test file included in example_segment_pages. You may have to do some pre-processing in an upstream model to get it into this shape. Similarly, if you need to union multiple sources, de-duplicate records, or filter out bad records, do this in an upstream model.
- Optionally configure extra parameters by adding them to your own
dbt_project.yml
file – see dbt_project.yml for more details:
# dbt_project.yml
config-version: 2
...
vars:
segment:
segment_page_views_table: "{{ source('segment', 'pages') }}"
segment_sessionization_trailing_window: 3
segment_inactivity_cutoff: 30 * 60
segment_pass_through_columns: []
segment_bigquery_partition_granularity: 'day' # BigQuery only: partition granularity for `partition_by` config
- Execute
dbt seed
-- this project includes a CSV that must be seeded for it the package to run successfully. - Execute
dbt run
– the Segment models will get built as part of your run!
This package has been tested on Redshift, Snowflake, BigQuery, and Postgres.