Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MLflow run names without asset key #1

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

AdrianoKF
Copy link

@AdrianoKF AdrianoKF commented Jun 4, 2024

Open questions:

  • How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.
  • Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.
  • When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

@AdrianoKF AdrianoKF requested a review from janwillemkl June 4, 2024 11:16
@AdrianoKF AdrianoKF marked this pull request as ready for review June 4, 2024 11:17
@maxmynter
Copy link

Thoughts from @nicholasjng and me:

How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.

No idea, but we can take time together tomorrow.

Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.

No. It is not a bad idea.

Repeatedly overwriting the last run keeps the mlflow run logs lean and in the way we set it up with the start up script.

We only show the MLflow page once. If we do that after we execute the workflow it is also not surprising when the run already exists.

When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

If we drop the Dagster run id, it does not make sense to log the asset key. Plus, the logged names can be (are?) different if we prefix them with e.g. train_... and, if needed, we can use this info to trace the logging asset.

In our story we will not have time to dive into details like the asset key.

Thus we'd advocate for dropping use_asset_key.

if run_name_prefix is not None:
run_name = f"{run_name_prefix}-{run_name}"
parts.append(run_name_prefix)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either name this suffix, or actually use it as a prefix


run_name = f"{asset_key}-{dagster_run_id}"
if run_name_prefix is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call the param prefix? The method name _get_run_name_from_context makes it clear that the prefix concerns a run name

Co-authored-by: Nicholas Junge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants