Allow MLflow run names without asset key #1

AdrianoKF · 2024-06-04T08:51:48Z

Open questions:

How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.
Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.
When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

maxmynter · 2024-06-04T14:18:54Z

Thoughts from @nicholasjng and me:

How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.

No idea, but we can take time together tomorrow.

Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.

No. It is not a bad idea.

Repeatedly overwriting the last run keeps the mlflow run logs lean and in the way we set it up with the start up script.

We only show the MLflow page once. If we do that after we execute the workflow it is also not surprising when the run already exists.

When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

If we drop the Dagster run id, it does not make sense to log the asset key. Plus, the logged names can be (are?) different if we prefix them with e.g. train_... and, if needed, we can use this info to trace the logging asset.

In our story we will not have time to dive into details like the asset key.

Thus we'd advocate for dropping use_asset_key.

maxmynter · 2024-06-04T14:19:42Z

src/tentacles/resources/mlflow_session.py

        if run_name_prefix is not None:
-            run_name = f"{run_name_prefix}-{run_name}"
+            parts.append(run_name_prefix)


We should either name this suffix, or actually use it as a prefix

maxmynter · 2024-06-04T14:28:41Z

src/tentacles/resources/mlflow_session.py


-        run_name = f"{asset_key}-{dagster_run_id}"
+        if run_name_prefix is None:


Should we just call the param prefix? The method name _get_run_name_from_context makes it clear that the prefix concerns a run name

src/tentacles/utils/dagster.py

Co-authored-by: Nicholas Junge <[email protected]>

AdrianoKF added 3 commits June 4, 2024 10:51

feat: Allow MLflow run names without asset key

45e22f7

feat: Optionally exclude Dagster run ID from MLflow run name

0b0795d

fix: Correctly handle multi-asset keys in MLflow tags

e103a3d

AdrianoKF requested a review from janwillemkl June 4, 2024 11:16

AdrianoKF marked this pull request as ready for review June 4, 2024 11:17

maxmynter reviewed Jun 4, 2024

View reviewed changes

nicholasjng reviewed Jun 4, 2024

View reviewed changes

src/tentacles/utils/dagster.py Outdated Show resolved Hide resolved

chore: Apply review suggestion

a156074

Co-authored-by: Nicholas Junge <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow MLflow run names without asset key #1

Allow MLflow run names without asset key #1

AdrianoKF commented Jun 4, 2024 •

edited

Loading

maxmynter commented Jun 4, 2024

maxmynter Jun 4, 2024

maxmynter Jun 4, 2024


		run_name = f"{asset_key}-{dagster_run_id}"
		if run_name_prefix is None:

Allow MLflow run names without asset key #1

Are you sure you want to change the base?

Allow MLflow run names without asset key #1

Conversation

AdrianoKF commented Jun 4, 2024 • edited Loading

maxmynter commented Jun 4, 2024

maxmynter Jun 4, 2024

Choose a reason for hiding this comment

maxmynter Jun 4, 2024

Choose a reason for hiding this comment

AdrianoKF commented Jun 4, 2024 •

edited

Loading