[feature] add `mle analyze` to summarize training logs and give optimal config #222

leeeizhang · 2024-09-23T13:27:24Z

RT

huangyz0918 · 2024-10-07T18:09:32Z

Any updates on the issue?

leeeizhang · 2024-10-08T06:50:45Z

I think mle analyze can perform as:

ML Experiment Summary: summarize the experiment runs, including the hyper-parameters users recently tried, and the achievements (e.g., loss or accuracy) for those runs. To support this, we should integrate some ML tracking tools (e.g., MLflow, W&B, etc.) first.
Training Suggestor (e.g., HPO, NAS, etc.): suggest hyper-parameters and model architectures based on the experiment summary. Furthermore, we could also allow mle-agent to automatically explore the training configurations and execute them, which could be explored in the future.
Weekly Experiment Report: the ML experiment summary could also be included in the report, as ML scientists may be very interested in their weekly experimental progress. The experiment summary could be an important data source for the report agent.

The divided tasks are:

[integrate] ML tracking integration
- MLflow (mle-agent, repx)
- W&B (repx only)
[agent] experiment summarizer
[agent] weekly experiment reporter
[agent] HPO and NAS suggestor

References:

huangyz0918 · 2024-10-08T16:53:21Z

What about integrating W&B first? By fetching the experiments, we can know the metrics, the code and the hyper parameters, etc. Then we use an agent to give suggestions, also calling some web search functions (e.g., paper with code benchmark) to give summary and suggestion (e.g., how to tune the parameters to improve certain metrics)

huangyz0918 · 2024-10-08T16:55:42Z

The train/validation accuracy, the loss items are time-series data, how to analyze such data using LLM is also an interesting problem. Or we can fetch the visualization from W&B and try to analyze the images directly using the muti-modal ability, you can have a try and then we can discuss.

leeeizhang · 2024-10-09T13:27:20Z

What about integrating W&B first? By fetching the experiments, we can know the metrics, the code and the hyper parameters, etc. Then we use an agent to give suggestions, also calling some web search functions (e.g., paper with code benchmark) to give summary and suggestion (e.g., how to tune the parameters to improve certain metrics)

That's right, I will integrate W&B first. For the analysis parts, we can incorporate the existing AdviseAgent to summarize and suggest.

The train/validation accuracy, the loss items are time-series data, how to analyze such data using LLM is also an interesting problem. Or we can fetch the visualization from W&B and try to analyze the images directly using the muti-modal ability, you can have a try and then we can discuss.

agree with you! How to analyze time-series data is still an open question for exploration, and using multi-modal ability to directly analyze experimental plots or charts is very valuable to try. Nevertheless, since the most common NAS and NPO algorithms still use the final & best accuracy/loss for analyzing and tuning, we may also keep the possibility that directly using each run's best/final metrics as prompts to build the agent in our very beginning PoC.

leeeizhang self-assigned this Sep 23, 2024

dosubot bot added the enhancement New feature or request label Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] add `mle analyze` to summarize training logs and give optimal config #222

[feature] add `mle analyze` to summarize training logs and give optimal config #222

leeeizhang commented Sep 23, 2024

huangyz0918 commented Oct 7, 2024

leeeizhang commented Oct 8, 2024 •

edited

Loading

huangyz0918 commented Oct 8, 2024

huangyz0918 commented Oct 8, 2024

leeeizhang commented Oct 9, 2024

[feature] add mle analyze to summarize training logs and give optimal config #222

[feature] add mle analyze to summarize training logs and give optimal config #222

Comments

leeeizhang commented Sep 23, 2024

huangyz0918 commented Oct 7, 2024

leeeizhang commented Oct 8, 2024 • edited Loading

huangyz0918 commented Oct 8, 2024

huangyz0918 commented Oct 8, 2024

leeeizhang commented Oct 9, 2024

[feature] add `mle analyze` to summarize training logs and give optimal config #222

[feature] add `mle analyze` to summarize training logs and give optimal config #222

leeeizhang commented Oct 8, 2024 •

edited

Loading