Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Use Airflow's own connections in dbt profiles #39

Open
akerone opened this issue Aug 18, 2021 · 2 comments
Open

Use Airflow's own connections in dbt profiles #39

akerone opened this issue Aug 18, 2021 · 2 comments

Comments

@akerone
Copy link

akerone commented Aug 18, 2021

It would be a great feature if we could use Airflow's own "Connections" section instead of having to provide usernames/passwords/keyfiles in profiles.yml.

I'm thinking specifically in using dbt in Google Cloud Composer, which provides a "bigquery_default" connection that other Operators can use. This is just an example, and could definitely be applied to every Airflow installation.

It would also help with maintaining the DRY principle, by not having to provide the same credentials in two separate parts of the code (airflow connections and profiles.yml).

@dinigo
Copy link

dinigo commented Oct 22, 2021

They can all be abstracted through environment variables. You can add those at provision time (when the instance is created, or either in Kubernetes/docker-compose/.profile/gcp composer). Currently airflow-dbt does not support passing env variables to the process running the command at runtime, so they should be defined on before hand.

profile:
  target: prod
  outputs:
    prod:
      type: postgres
      host: 127.0.0.1
      # IMPORTANT: Make sure to quote the entire Jinja string here
      user: "{{ env_var('DBT_USER') }}"
      password: "{{ env_var('DBT_PASSWORD') }}"

As this is such a common practice I have added this functionality by default to the PR #45
After merging my PR you would be able to use those by declaring the env dictionary. For example:

    DbtTestOperator(
        task_id='run_dbt',
        project_dir=abspath('./jaffle-shop'),
        profiles_dir=abspath('.'),
        dbt_bin='/home/airflow/.local/bin/dbt',
        env={
            'DBT_USER': 'test-user',
            'DBT_PASSWORD': 'test-pasword'
    }

Or the more common "use a service account" mounted as a secret in a container.

    DbtRunOperator(
        task_id='run_dbt',
        project_dir=abspath('./jaffle-shop'),
        profiles_dir=abspath('.'),
        dbt_bin='/home/airflow/.local/bin/dbt',
        env={
            'GOOGLE_APPLICATION_CREDENTIALS': '/run/secrets/gcp_credentials.json'
    },

However DBT authentication is far too vague to declare it in a connection (other than the extras... maybe...). But that's my opinion anyway.

@akerone
Copy link
Author

akerone commented Nov 18, 2021

Thanks for the comment. ENV variables can definitely help in a lot of use cases, and probably bump this issue down into a much lower priority

However DBT authentication is far too vague to declare it in a connection (other than the extras... maybe...). But that's my opinion anyway.

In Google's Cloud Composer, Airflow is deployed with a set of preexisting connections (including its own GCP project with a Service Account Key in the extras section), and so it would be great if the profiles.yml could be built from one of those connections, rather than having to specify the env in each DAG we deploy.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants