roadmap post december 2024

dbt-labs · Dec 20, 2024 · 09545b7 · 09545b7
1 parent 95c090b
commit 09545b7
Showing 1 changed file with 207 additions and 0 deletions.
diff --git a/docs/roadmap/2024-12-play-on.md b/docs/roadmap/2024-12-play-on.md
@@ -0,0 +1,207 @@
+# dbt: Play On (December 2024)
+
+Oh hi there :)
+
+We’ve had the opportunity to talk a lot about our investments in open source over the past few months:
+
+- [I [Grace] renewed my vows with the dbt Community in a Vegas wedding ceremony officiated by Elvis](https://youtu.be/DC9sbZBYzpI?si=uK9Aie6Jl-FIHggm) (aka [@jtcohen6](https://github.com/jtcohen6)).
+- [We spoke about dbt Labs’ continued commitment to Open Source in the Coalesce Keynote](https://youtu.be/I72yUtrmhbY?si=Iu6s9WXdHnFtyCvi).
+- [We wrote about the importance of extensibility for how we build features in dbt Core.](https://www.getdbt.com/blog/dbt-core-v1-9-is-ga#:~:text=Extensibility%20is%20what%20powers%20the%20community)
+
+All of these points and themes remain incredibly relevant to our team and are fueling our vision as we prepare for the new year:
+
+- we're committed to defending dbt Core as the open source standard for data transformation, which will remain licensed under Apache 2.0
+- the dbt framework will continue to be shaped by a collaborative effort between you (the community) and us (the maintainers)
+- when we add something new to the standard, we are committing to the long term, which means we are intentional about *how* and *when* we expand its breadth - in the meantime, lean into dbt's extensibility (and show us how you're doing it!)
+
+Now that we’ve gotten into our new rhythm…
+
+Now that last year’s focus on stability has earned us the right to ship awesome new additions to the dbt framework…
+
+Now that we’re actually seeing a *much* higher % of projects running the latest & greatest dbt…
+
+…the next 6-12 months will feel similar to the last. dbt will keep getting better at doing what it does - being a mission-critical piece of your data stack, and a delightful part of your work. To that end, stability always comes first. And, we will be shipping some exciting new features to dbt Core. 
+
+# Oh What A Year It’s Been!
+
+This year, we released two new minor versions of dbt Core.
+
+| **Version** | **When** | **Namesake** | **Stuff** |
+| --- | --- | --- | --- |
+| [v1.8](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8) | May | [Julian Abele](https://github.com/dbt-labs/dbt-core/releases/tag/v1.8.0) | Unit testing. Decoupling of dbt-core and adapters. Flags for managing changes to legacy behaviors. |
+| [v1.9](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.9) | December | [Dr. Susan La Flesche Picotte](https://github.com/dbt-labs/dbt-core/releases/tag/v1.9.0) | Microbatch incremental strategy. New configurations and spec for snapshots. Standardizing support for Iceberg. |
+
+In the [last roadmap post](https://github.com/dbt-labs/dbt-core/blob/main/docs/roadmap/2023-11-dbt-tng.md), we committed to prioritizing all-around interface stability. This included decoupling dbt-core and adapters, introducing [behavior change flags](https://docs.getdbt.com/reference/global-configs/behavior-changes) to give you time to adjust to ~~breaking~~ changes, and improving the stability of metadata artifacts. You can read more about those efforts [here](https://www.notion.so/E-team-offsite-competitive-deep-dive-SQLMesh-SDF-November-2024-129bb38ebda7807ca023ddf198fb1279?pvs=21).
+
+Stability means you can upgrade with confidence. 
+
+Stability means less disruptions for our adapters and integrations in the dbt ecosystem.
+
+Stability means we’ve earned the right to **ship some big new features**.
+
+This past year, we were able to ship some long-awaited additions and enhancements to the dbt Core framework:
+
+- [**Unit tests**](https://github.com/dbt-labs/dbt-core/discussions/8275) allow you to validate your SQL modeling logic on a small set of static inputs before you materialize your full model in production.
+- [**Snapshots**](https://github.com/dbt-labs/dbt-core/discussions/7018) got the glow up they deserved - with new configurations and spec to make capturing your data changes easier to configure, run, and customize.
+- [**Microbatch**](https://github.com/dbt-labs/dbt-core/discussions/10672) incremental models enable you to optimize your largest datasets, by transforming your timeseries data in discrete periods with their own SQL queries, rather than all at once.
+- [**Iceberg**](https://docs.getdbt.com/blog/icebeg-is-an-implementation-detail) table format (an open standard for storing data and accessing metadata) is standardized across adapters, enabling you to store your data in a way that promises interoperability with multiple compute engines.
+
+We also were able to close out some highly-upvoted “paper cuts”:
+
+- `--empty` flag limits the `ref`s and `source`s to zero rows, which you can use for schema-only dry runs that validate your model SQL and run unit tests
+- dbt now issues a single (batch) query when calculating source freshness through metadata, instead of executing a query per source
+- improvements to `state:modified` help reduce the risk of false positives due to environment-aware logic
+- you can now document your data tests with `description`s
+
+A lot of these features are things that you (the community) have been discussing and experimenting with for years. To everyone who opened an issue, commented on a discussion, joined us for a feedback session, developed a package, or contributed to our code base… however you made your voice heard this year, **thank you** for continuing to care, for continuing to lean in, for continuing to help shape dbt. 
+
+# New Year’s Resolutions
+
+To start off the new year, we’re focusing on three major areas of development to the dbt framework:
+
+- **typed macros** - configure Python type annotations for better macro validation
+- **catalogs** - first-class support for materializing dbt models into external catalogs, providing a warehouse-agnostic interface for managing data in object storage
+- **sample mode** - limit your data to smaller, time-based samples for faster development and CI testing
+
+## Typed Macros
+
+In the simplest terms, [macros](https://docs.getdbt.com/docs/build/jinja-macros#macros) are are pieces of code, written in Jinja, that can be reused throughout your project – they are similar to "functions" in other programming languages. 
+
+In practice, you can reference a macro in a model’s SQL (or config block, or hook) to:
+
+- make your SQL code more DRY by abstracting snippets of SQL into reusable “functions”
+- use the results of one query to generate a set of logic
+- change the way your project builds based on the current environment
+
+Or, if you really need to, run arbitrary SQL in your warehouse via `dbt run-operation`.
+
+Macros can depend on inputs, vars, `env_vars`, or even other macros — *and* there are “special” macros for defining custom generic tests and materializations.
+
+This immense flexibility is one of the great benefits of macros — you can use them to solve **a lot** of different problems.
+
+However, on the flip side, this flexibility — where macros can be whatever you want them to be — makes it challenging to validate that your macros are doing what you expect. 
+
+Without a way to define expected types for the inputs and outputs of macros, our adapter maintainers struggle to validate that a built-in macro override will produce the correct output. Furthermore, *any* analytics engineer writing a macro in dbt should be able to validate the expected behavior. 
+
+One of the things we aim to work on in the coming months is an interface for **typed macros.** 
+
+Imagine being able to codify the expected types* for the inputs and outputs for your macros:
+
+```sql
+{% macro cents_to_dollars(column_name: str, scale: int = 2 -> str) %}
+	({{ column_name }} / 100)::numeric(16, {{ scale }})
+{% endmacro %}
+```
+
+**Note: These are a subset of Python types (internal to dbt), not the data types within the data warehouse.*
+
+Then, we could issue warnings at parse time when usage of a macro violates these types. By adding the ability to configure type expectations, user-created macros become more predicable, and [built-in macros become easier to override](https://github.com/dbt-labs/dbt-core/issues/9164). 
+
+Should this type checking be on by default, or something you opt into? Should we also support dbt-specific types, such as `Relation`? Head over to the [github discussion](https://github.com/dbt-labs/dbt-core/discussions/11158) to participate in the conversation!
+
+## Catalogs
+
+In `v1.9`, we shipped a set of standard configs to materialize dbt models in Iceberg table format:
+
+```sql
+{{
+  config(
+    materialized = "table",
+    table_format = "iceberg",
+    external_volume = "s3_iceberg_snow"
+  )
+}}
+
+...
+```
+
+Supporting the Iceberg table format was our first step towards empowering users to adopt Iceberg as a standard storage format for their critical datasets.
+
+In the coming months, we want to add first-class support for “catalogs” in dbt. “Catalogs,” including Glue or Iceberg REST, operate at a level of abstraction above specific Iceberg tables — and they can provide a warehouse-agnostic interface for managing a large number of datasets in object storage. 
+
+Imagine a new top-level `catalogs.yml` that tells dbt about the catalog integrations you want to write to:
+
+```yaml
+catalogs:
+  - catalog_name: my_first_catalog
+    write_integrations:  
+	  - integration_name: prod_glue_write_integration
+		external_volume: my_prod_external_volume
+		table_format: iceberg
+		catalog_type: glue	    
+```
+
+Then, in your model’s configuration, simply specify the `catalog` field:
+
+```yaml
+{{
+  config(
+    catalog = "my_first_catalog"
+  )
+}}
+
+...
+```
+
+These new configurations will enable dbt to template the correct DDL statements for the platform (`CREATE GLUE ICEBERG TABLE` vs. `CREATE ICEBERG TABLE`, etc.). Now, when you run the above model, it will be materialized as an Iceberg table in s3 registered in the AWS Glue catalog. 
+
+We believe the approach of writing datasets to a platform-agnostic storage layer, and registering those datasets with a similarly agnostic *catalog service,* will become important to the technical foundations of the dbt workflow — the [Analytics Development Lifecycle (ADLC)](https://www.getdbt.com/resources/guides/the-analytics-development-lifecycle) — moving forward. Support for materializing Iceberg tables in external catalogs is another step down this path.
+
+To read more about catalogs and participate in shaping this feature, head over to the [discussion on GitHub](https://github.com/dbt-labs/dbt-core/discussions/11171)!
+
+## Sample Mode
+
+We began our [“event_time” journey](https://github.com/dbt-labs/dbt-core/discussions/10672) with microbatch incremental models. Next up is [sample mode](https://github.com/dbt-labs/dbt-core/discussions/10672#:~:text=%F0%9F%8C%80%5BNext%5D%20%E2%80%9CSample%E2%80%9D%20mode%20for%20dev%20%26%20CI%20runs) - we want to support a pattern for speeding up development and CI testing by filtering your dataset to a *time-limited sample.* 
+
+Imagine your model contains a `ref` statements like so:
+
+```sql
+select * from {{ ref('fct_orders') }}
+```
+
+During standard runs, this compiles to:
+
+```sql
+select * from my_db.my_schema.fct_orders
+```
+
+But during a *sample* run, dbt could automatically filter down large tables to the last X days of data using the same `event_time` column used for microbatch. The exact syntax here is TBD, but imagine you run something like `dbt run --sample 3`, which then compiles your code to:
+
+```sql
+select * from my_db.my_schema.fct_orders
+-- the event_time column in fct_orders is 'order_at'
+where order_at > dateadd(-3, day, current_date)
+```
+
+No more [overriding the `source` or `ref` macro](https://discourse.getdbt.com/t/limiting-dev-runs-with-a-dynamic-date-range/508) to hack together this functionality.
+
+Built-in support for “sample mode” — filtering to a consistent time-based “slice” across all models — means faster development and faster testing because you’re running on *less* data. This could be configured for a specific dbt invocation, a CI environment, or as a set-it-and-forget-it default for everyone who’s developing on your team’s project.
+
+We’ll be opening an additional sample-mode-specific GitHub discussion in January, but feel free to queue up your thoughts [here](https://github.com/dbt-labs/dbt-core/discussions/10672) or [here](https://github.com/dbt-labs/dbt-core/issues/8378) in the meantime!
+
+## and who could forget, paper cuts!
+
+Those are the big things we plan to tackle in the coming months. As always, we want to tackle some smaller `paper_cut`s as well. Your upvotes and comments help us prioritize these, so please make some noise if there’s something you care a whole lot about. Some that are top of mind for me already:
+
+- **DRY-er YML**, including:
+    - [Defining vars configs outside dbt_project.yml](https://github.com/dbt-labs/dbt-core/issues/2955)
+    - [Add ability to import/include YAML from other files](https://github.com/dbt-labs/dbt-core/issues/9695)
+- **Enhancements to model versions and contracts**, including:
+    - [Automatically create view/clone of latest version](https://github.com/dbt-labs/dbt-core/issues/7442)
+    - [Support constraints independently from enforcing a full model contract](https://github.com/dbt-labs/dbt-core/issues/10195)
+- and [warnings when configs are misspelled](https://github.com/dbt-labs/dbt-core/issues/8942) :)
+
+# [**Call me**, **beep me** if you wanna reach me](https://www.youtube.com/watch?v=GIgLqN_rAXU)
+
+If one of *your* New Year’s resolutions is to be more involved in the dbt community, here are some of the many ways you can contribute:
+
+- open, upvote, and comment on GitHub [issues](https://docs.getdbt.com/community/resources/oss-expectations#issues)
+- start a [discussion](https://docs.getdbt.com/community/resources/oss-expectations#discussions) or discourse post when you’ve got a Big Idea
+- engage in conversations in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community)
+- [contribute code](https://docs.getdbt.com/community/resources/oss-expectations#pull-requests) back to one of our open source repos, for one of our issues tagged `help_wanted` or `good_first_issue`, and our engineering team will work with you to get it over the finish line
+- join us on zoom for feedback sessions (which we’ll post about in slack and in the relevant GitHub discussion)
+- share your creative solutions in a blog post, at a dbt meetup, or by talking at Coalesce
+
+Your feedback and thoughts are incredibly valuable to us, so make yourself heard this year. We’re excited to build some awesome new features together. 
+
+[Your loving wife](https://youtu.be/DC9sbZBYzpI?si=xCGWoQDK-w13Fz6U&t=1594), Grace