Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [false positive] incremental_predicates config causing model to show in state:modified without changes #10968

Closed
2 tasks done
james-ho-cko opened this issue Nov 4, 2024 · 11 comments
Labels
bug Something isn't working state: modified state Stateful selection (state:modified, defer)

Comments

@james-ho-cko
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have some models with incremental predicate that looks back n days only. Something like

incremental_predicates = [
            "DBT_INTERNAL_DEST.ingestion_timestamp::date >= dateadd(day, -10, " ~ var('date_parameter', modules.datetime.datetime.now()) ~ "::date)"
        ],

What we're noticing is the python module would render an actual value in the manifest file, which means the timestamp could change without the model being modified. This is causing issue cause the state:modified would pick up all models with this predicate.

Expected Behavior

We would expect the manifest to be unchanged and the models with this setup wouldn't be flagged as changed when running state:modified

Steps To Reproduce

  1. Adding an incremental predicate to a model with module.datetime.datetime.now() in it.
incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date >= dateadd(day, -10, " ~ modules.datetime.datetime.now() ~ "::date)"],
  1. Run compile to generate the manifest file
  2. The incremental predicate of the model in the manifest file would show the current timestamp
  3. Copy the manifest file to current directory mv ..../target/manifest.json .
  4. Switch to another branch and generate the manifest file again without changing anything
  5. run dbt ls -m state:modified.body --state . to compare the 2 branches. It shouldn't show anything as we didn't change any model, but the model with this predicate would show up. Looking into the 2 manifest files can see the timestamps in the incremental predicate are different

Relevant log output

No response

Environment

- OS:macOS 14.7, build 23H124
- Python:3.9.6
- dbt:1.6.1

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@james-ho-cko james-ho-cko added bug Something isn't working triage labels Nov 4, 2024
@dbeatty10 dbeatty10 added state Stateful selection (state:modified, defer) state: modified labels Nov 4, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @james-ho-cko !

We've been addressing some case of false positives within #9562.

Did you happen to try installing dbt-core 1.9.0b3 and see if it still has the issue or not?

For date_parameter, is it coming from vars: in dbt_project.yml or --vars passed on the command line?.

@james-ho-cko
Copy link
Author

james-ho-cko commented Nov 5, 2024

Thanks for getting back to me Doug, the epic you shared is really helpful.
We haven't upgraded to the latest version so I can't really tell. We currently get around with this by creating a macro as the predicate.
The date_parameter is what we use to be able to ad-hocly change the look back day if we needed to, so it'd be passed on the command line like --vars
I tried to add

flags:
  state_modified_compare_more_unrendered_values: true

to dbt_project.yml as the documentation suggests but seems not working. Is it only available in v1.9?

@dbeatty10
Copy link
Contributor

state_modified_compare_more_unrendered_values is available starting in v1.9 (but not for earlier versions than that).

@dbeatty10
Copy link
Contributor

@james-ho-cko

I tried to replicate the scenario that you reported using dbt v1.6, but I wasn't able. Could you try tweaking the file and commands below to replicate your bug report?

Create this file:

models/my_model.sql

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run these commands:

dbt compile
mv target/manifest.json .
dbt list -s state:modified --state .

I got this output:

04:04:58  Running with dbt=1.6.5
04:04:59  Registered adapter: snowflake=1.6.6
04:04:59  Found 1 model, 0 sources, 0 exposures, 0 metrics, 375 macros, 0 groups, 0 semantic models
04:04:59  The selection criterion 'state:modified' does not match any nodes
04:04:59  No nodes selected!

@james-ho-cko
Copy link
Author

I think you need to change branch.

What we have is a model already in production with this predicate setup, and every time when we run CI on feature branch against the main branch's manifest, the model will show up even it's untouched.

It's a bit inconsistent when I try to replicate it locally by

dbt compile
mv target/manifest.json .

git checkout -b testing
dbt list -s state:modified --state .

This might now work as it's the same file you created locally. I think it needs to be a model that's already in 2 different branches.
Let me know if that makes sense!

@james-ho-cko
Copy link
Author

@dbeatty10 try this:

Create this file:

models/my_model.sql

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run these commands:

dbt compile
mv target/manifest.json .

Delete the model file(keep the manifest at current directory)

Switch to another branch

git checkout -b testing

Create a file with the same name and the exactly same contents:

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run

dbt ls -m state:modified --state .

This should give you my_model. And if you compare the 2 manifest, you should be able to see the only difference is the timestamp in the predicate.
Screenshot 2024-11-06 at 16 35 34

@dbeatty10
Copy link
Contributor

@james-ho-cko I tried the steps you outlined, but I still wasn't able to replicate what you are reporting with state:modified.

I can see that the "incremental_predicates": portions of manifest.json and target/manifiest.json are different, but I get the message:

04:27:41  The selection criterion 'state:modified' does not match any nodes
04:27:41  No nodes selected!

As an aside, I wouldn't expect changing branches to make a difference either way as it relates to state:modified.

Are you able to replicate the issue without changing branches?

@james-ho-cko
Copy link
Author

Hey Doug, sorry for the late reply.
I found a way to replicate this issue without changing branches. You'll need to run dbt clean before generating the new manifest file. So it'd be creating the file then

dbt compile -m my_model
mv target/manifest.json .

dbt clean
dbt deps
dbt compile -m my_model
dbt ls -s state:modified --state .

This should give you my_model as a result.

It also worked after waiting for a few minutes before generating the manifest file again for comparing, feels like a caching thing.

@dbeatty10
Copy link
Contributor

Awesome @james-ho-cko !

Those commands helped me reliably replicate what you are saying. And here's an even more brief way that gave me my_model as a result without needing to do any dbt clean:

dbt parse --target-path artifacts
dbt list --select state:modified --state artifacts

@dbeatty10 dbeatty10 changed the title [Bug] Python module causing manifest generating different results without changes [Bug] [false positive] Python module causing manifest generating different results without changes Nov 13, 2024
@dbeatty10
Copy link
Contributor

However, if the same value is used for --vars across invocations, it does not show it as modified:

dbt parse --target-path artifacts --vars '{"date_parameter": "2024-01-01"}'
dbt list --select state:modified --state artifacts --vars '{"date_parameter": "2024-01-01"}'

@dbeatty10
Copy link
Contributor

The underlying explantation seems to be that state:modified.config is comparing the
post-rendered incremental_predicates config rather than the pre-rendered incremental_predicates config. See below for a simplified example.

Good news: starting in dbt-core v1.9, setting the state_modified_compare_more_unrendered_values flag to True changes the state:modified comparison from using rendered values to unrendered values instead 🎉

Example

The example below shows that the observed vs. expected behavior isn't due to using var or the datetime module. Rather, it is when a rendered Jinja value in incremental_predicates changes between the state directory and the current invocation.

models/my_model_table.sql

{{
  config(
    materialized = 'table',
  )
}}

select 1 as id, {{ dbt.string_literal(invocation_id) }} as invocation_id

models/my_model_incremental_predicates.sql

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'append',
    incremental_predicates = ["DBT_INTERNAL_DEST.invocation_id = '" ~ invocation_id ~ "'"],
  )
}}

select 1 as id, {{ dbt.string_literal(invocation_id) }} as invocation_id

Run these commands:

dbt parse --target-path artifacts
dbt list --select state:modified --state artifacts

You'll see that my_model_incremental_predicates shows as modified whereas my_model_table doesn't.

But if you set the state_modified_compare_more_unrendered_values flag to True within dbt_project.yml and try again, then neither of them will show as modified 😎:

dbt_project.yml

flags:
  state_modified_compare_more_unrendered_values: True

Summary

Closing this as resolved by #10487 and/or #10675.

@james-ho-cko could you let us know if you try this in dbt-core v1.9 with the state_modified_compare_more_unrendered_values flag and it doesn't work for you?

@dbeatty10 dbeatty10 changed the title [Bug] [false positive] Python module causing manifest generating different results without changes [Bug] [false positive] incremental_predicates config causing model to show in state:modified without changes Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working state: modified state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

2 participants