Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] --empty flag causes syntax errors when source CTE uses ref name as prefix #11149

Closed
2 tasks done
sfangman opened this issue Dec 13, 2024 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working empty Issues related to the --empty CLI flag

Comments

@sfangman
Copy link

sfangman commented Dec 13, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I am looking to use the --empty flag to do basic syntax/schema testing of DBT models in our project but I am running into an issue where references to the model name (i.e. as a column prefix) are broken due to the way in which the limit 0 is applied in a subquery without an alias.

Expected Behavior

I would expect that when the --empty flag is used, a table alias of the original model name is supplied after the empty subquery to preserve references to the model name. For example of desired behavior see example below in Steps To Reproduce

Steps To Reproduce

Running with DBT in any environment (Redshift for us) with package versions below

dbt-core==1.9.0
dbt-redshift==1.9.0

Given a DBT SQL model below (test_model.sql)

with previous_cte as (

    select 1 as col1, 2 as col2

),

my_cte as (

    select
          source_model.col1
        , previous_cte.col2
    from {{ ref('source_model') }}
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

)

select *
from my_cte

Running dbt compile --select test_model --empty yields the following sql

with previous_cte as (

    select 1 as col1, 2 as col2

),

my_cte as (

    select
          source_model.col1
        , previous_cte.col2
    from (select * from "my_db"."my_schema"."source_model" where false limit 0)
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

)

select *
from my_cte

When executed this SQL returns a syntax error:

ERROR: relation "source_model" does not exist

I would like for the --empty flag to also include a table alias after the subquery to preserve references. Example compiled SQL below:

with previous_cte as (

    select 1 as col1, 2 as col2

),

my_cte as (

    select
          source_model.col1
        , previous_cte.col2
    from (select * from "my_db"."my_schema"."source_model" where false limit 0) as source_model
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

)

select *
from my_cte

Relevant log output

No response

Environment

- OS: macOS Sonoma 14.6
- Python: 3.11.10
- dbt: 1.9.0

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@sfangman sfangman added bug Something isn't working triage labels Dec 13, 2024
@dbeatty10 dbeatty10 added the empty Issues related to the --empty CLI flag label Dec 13, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @sfangman !

I see what you are saying -- thanks you for such a nice write-up 🤩

I would like for the --empty flag to also include a table alias after the subquery to preserve references.

This is a totally reasonable ask!

But we'd also need to figure out how to handle the opposite case if someone does supply aliases like this:

    select
          source_model.col1
        , previous_cte.col2
    from {{ ref('source_model') }} as source_model
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

In that case, we wouldn't want to double alias like this:

    select
          source_model.col1
        , previous_cte.col2
    from (select * from "my_db"."my_schema"."source_model" where false limit 0) as source_model as source_model
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

Current known limitations

Similar to dbt-labs/dbt-snowflake#1251, it is known current limitation of dbt unit tests that relation names within dbt models must be aliased if aliases are used within the join logic.

We have a small call-out within our "Before you begin" section of our documentation, and we'd be open to suggestions of how to word this better:

Table names must be aliased in order to unit test join logic.

Workaround in the meantime

Our recommendation in the meantime is to add aliases like this whenever the join logic needs them:

    select
          source_model.col1
        , previous_cte.col2
    from {{ ref('source_model') }} as source_model
    inner join previous_cte
        on source_model.col1 = previous_cte.col1

@dbeatty10 dbeatty10 removed the triage label Dec 14, 2024
@sfangman
Copy link
Author

Hi @dbeatty10 - thanks for the quick response! This makes a ton of sense to me, we can go about updating aliases as a part of our project upgrade process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working empty Issues related to the --empty CLI flag
Projects
None yet
Development

No branches or pull requests

2 participants