Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Can't pass args for run-operation in programmatic invocation as kwarg #10355

Open
2 tasks done
ben-schreiber opened this issue Jun 23, 2024 · 5 comments · May be fixed by #10473
Open
2 tasks done

[Bug] Can't pass args for run-operation in programmatic invocation as kwarg #10355

ben-schreiber opened this issue Jun 23, 2024 · 5 comments · May be fixed by #10473
Labels
bug Something isn't working python_api Issues related to dbtRunner Python entry point Refinement Maintainer input needed user docs [docs.getdbt.com] Needs better documentation

Comments

@ben-schreiber
Copy link

ben-schreiber commented Jun 23, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Passing the --args parameter as a kwarg to the dbtRunner.invoke method throws a TypeError. This forces the parameter to only be pass as:

dbtRunner().invoke(['run-operation', '--args', '...'])

This looks to be caused by a naming conflict with the invoke method here

Expected Behavior

To also be able to pass it as a kwarg

dbtRunner().invoke(['run-operation'], args=...)

Steps To Reproduce

from dbt.cli.main import dbtRunner

runner = dbtRunner()
runner.invoke(['run-operation', 'some_macro'], args={'arg1': 'foo', 'arg2': 'goo'})

Relevant log output

TypeError
...
TypeError: dbtRunner.invoke() got multiple values for argument 'args'

Environment

- OS: MacOS Sonoma 14.4
- Python: 3.10.12
- dbt: 1.8.3

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@ben-schreiber ben-schreiber added bug Something isn't working triage labels Jun 23, 2024
@dbeatty10 dbeatty10 added the python_api Issues related to dbtRunner Python entry point label Jun 25, 2024
@ben-schreiber
Copy link
Author

Here are two suggestions which came to mind:

  1. The args parameter could be renamed. Something like,
def invoke(self, invocation_args: List[str], **kwargs) -> dbtRunnerResult: ...

Although, this would be a breaking change and would need to be handled accordingly.

  1. To add a dedicated name for this use-case (e.g. macro_args or something like that) which then would be translated back to args
def invoke(self, args: List[str], **kwargs) -> dbtRunnerResult:
    if 'macro_args' in kwargs:
        kwargs['args'] = kwargs.pop('macro_args')

This too would have to be appropriately documented since it breaks from the kwargs convention of this method.

@dbeatty10
Copy link
Contributor

Thanks for reporting this along with the root cause and some ideas @ben-schreiber !

Root cause

Agreed that the cause is the naming conflict between the positional parameter args in the invoke method and the --args CLI flag for the run-operation subcommand.

Workaround

Pass in strings for the --args CLI flag like this:

runner.invoke(
    ["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doog'}"]
)

Reprex and examples

macros/some_macro.sql

{% macro some_macro(arg1, arg2) %}
    {{ log("arg1: " ~ arg1, True) }}
    {{ log("arg2: " ~ arg2, True) }}
    {{ log("some_var: " ~ var("some_var", "loo"), True) }}
{% endmacro %}

some_script.py

from dbt.cli.main import dbtRunner

runner = dbtRunner()

# Works
runner.invoke(
    ["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doog'}"]
)

# Also works
runner.invoke(
    args=["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doo'}"]
)

# # Doesn't work due to naming conflict with the `args` parameter
# runner.invoke(["run-operation", "some_macro"], args={"arg1": "foo", "arg2": "goo"}, vars={"some_var": "doo"})
dbt run-operation some_macro --args "{'arg1': 'foo', 'arg2': 'goo'}" --vars "{'some_var': 'doo'}" 
python some_script.py

Monkey patch example

While creating a monkey patch isn't what we'd recommend or advocate for in this scenario, sharing this code example for sake of completeness.

from typing import List
from dbt.cli.main import dbtRunnerResult
from dbt.cli import requires
from click.exceptions import BadOptionUsage
from click.exceptions import NoSuchOption, UsageError
from click.exceptions import Exit as ClickExit
from dbt.cli.main import cli

from dbt.cli.main import dbtRunner

runner = dbtRunner()


# Define a new version of the invoke method
def new_invoke(self, invocation_args: List[str], **kwargs) -> dbtRunnerResult:

    try:
        dbt_ctx = cli.make_context(cli.name, invocation_args.copy())
        dbt_ctx.obj = {
            "manifest": self.manifest,
            "callbacks": self.callbacks,
        }

        for key, value in kwargs.items():
            dbt_ctx.params[key] = value
            # Hack to set parameter source to custom string
            dbt_ctx.set_parameter_source(key, "kwargs")  # type: ignore

        result, success = cli.invoke(dbt_ctx)
        return dbtRunnerResult(
            result=result,
            success=success,
        )
    except requires.ResultExit as e:
        return dbtRunnerResult(
            result=e.result,
            success=False,
        )
    except requires.ExceptionExit as e:
        return dbtRunnerResult(
            exception=e.exception,
            success=False,
        )
    except (BadOptionUsage, NoSuchOption, UsageError) as e:
        return dbtRunnerResult(
            exception=DbtUsageException(e.message),
            success=False,
        )
    except ClickExit as e:
        if e.exit_code == 0:
            return dbtRunnerResult(success=True)
        return dbtRunnerResult(
            exception=DbtInternalException(f"unhandled exit code {e.exit_code}"),
            success=False,
        )
    except BaseException as e:
        return dbtRunnerResult(
            exception=e,
            success=False,
        )


# Monkey patch the invoke method
dbtRunner.invoke = new_invoke

# The rest of your code goes below

@dbeatty10
Copy link
Contributor

Options

  1. Rename the args parameter of invoke to something unique (e.g., invocation_args) accepting that it will break for anyone using the positional argument as a named parameter.
  2. Add a dedicated name for this use-case (e.g. macro_args) accepting the complications in the code and user experience
  3. Leave things as-is accepting that using the --vars flag for run-operation requires the workaround above or monkey patching the definition of invoke.

The first option of giving it a unique name like invocation_args seems most attractive to me. I'm guessing that it would be uncommon for folks to use a named parameter within calls to invoke(), and none of our code examples use the named parameter.

If we went this route, we would call it out in a migration guide. We'd also want to take care to choose a name that is unique enough that we'd be unlikely to ever add a CLI flag with the same name. See here for an example script that can get the names of all the CLI flags.

@ben-schreiber To help us understand the trade-offs here, could you share more about the effect on the user experience if we leave it as-is and users only have the workaround you mentioned?

@ben-schreiber
Copy link
Author

@dbeatty10 I agree with you that option one is ideal. I would extend it even further to update the method's signature to include /:

def invoke(self, invocation_args: List[str], /, **kwargs) -> dbtRunnerResult: ...

thereby formalizing the desire to separate the usage of invocation_args (positional only) from kwargs (keyword only) in code. This would also allow the invocation_args argument to be be given a unique name without worrying as much about readability/usability for end users.

In regards to your question, the effect is not terribly large as long as this edge case is clearly documented. I encountered the issue when building a wrapper around DBT's programmatic invocations for Apache Airflow. In that use case, passing keyword arguments to invoke is more natural than converting Pythonic objects to the CLI representation.

@ben-schreiber
Copy link
Author

Also, here is another/shorter monkeypatch:

from typing import List
from dbt.cli.main import dbtRunnerResult, dbtRunner

invoke_v1 = dbtRunner.invoke

def invoke_v2(self, invocation_args: List[str], /, **kwargs) -> dbtRunnerResult:
    args_copy = list(invocation_args)
    kwargs_copy = dict(kwargs)
    if 'args' in kwargs_copy:
         args_copy += ['--args', kwargs_copy.pop('args')]
    return invoke_v1(self, args_copy, **kwargs_copy)

dbtRunner.invoke = invoke_v2

@dbeatty10 dbeatty10 added the user docs [docs.getdbt.com] Needs better documentation label Jul 22, 2024
@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python_api Issues related to dbtRunner Python entry point Refinement Maintainer input needed user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
2 participants