Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] add mle analyze to summarize training logs and give optimal config #222

Open
leeeizhang opened this issue Sep 23, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@leeeizhang
Copy link
Collaborator

RT

@leeeizhang leeeizhang self-assigned this Sep 23, 2024
@dosubot dosubot bot added the enhancement New feature or request label Sep 23, 2024
@huangyz0918
Copy link
Member

Any updates on the issue?

@leeeizhang
Copy link
Collaborator Author

leeeizhang commented Oct 8, 2024

I think mle analyze can perform as:

  1. ML Experiment Summary: summarize the experiment runs, including the hyper-parameters users recently tried, and the achievements (e.g., loss or accuracy) for those runs. To support this, we should integrate some ML tracking tools (e.g., MLflow, W&B, etc.) first.

  2. Training Suggestor (e.g., HPO, NAS, etc.): suggest hyper-parameters and model architectures based on the experiment summary. Furthermore, we could also allow mle-agent to automatically explore the training configurations and execute them, which could be explored in the future.

  3. Weekly Experiment Report: the ML experiment summary could also be included in the report, as ML scientists may be very interested in their weekly experimental progress. The experiment summary could be an important data source for the report agent.

The divided tasks are:

  • [integrate] ML tracking integration
    • MLflow (mle-agent, repx)
    • W&B (repx only)
  • [agent] experiment summarizer
  • [agent] weekly experiment reporter
  • [agent] HPO and NAS suggestor

References:

@huangyz0918
Copy link
Member

What about integrating W&B first? By fetching the experiments, we can know the metrics, the code and the hyper parameters, etc. Then we use an agent to give suggestions, also calling some web search functions (e.g., paper with code benchmark) to give summary and suggestion (e.g., how to tune the parameters to improve certain metrics)

@huangyz0918
Copy link
Member

The train/validation accuracy, the loss items are time-series data, how to analyze such data using LLM is also an interesting problem. Or we can fetch the visualization from W&B and try to analyze the images directly using the muti-modal ability, you can have a try and then we can discuss.

@leeeizhang
Copy link
Collaborator Author

What about integrating W&B first? By fetching the experiments, we can know the metrics, the code and the hyper parameters, etc. Then we use an agent to give suggestions, also calling some web search functions (e.g., paper with code benchmark) to give summary and suggestion (e.g., how to tune the parameters to improve certain metrics)

That's right, I will integrate W&B first. For the analysis parts, we can incorporate the existing AdviseAgent to summarize and suggest.

The train/validation accuracy, the loss items are time-series data, how to analyze such data using LLM is also an interesting problem. Or we can fetch the visualization from W&B and try to analyze the images directly using the muti-modal ability, you can have a try and then we can discuss.

agree with you! How to analyze time-series data is still an open question for exploration, and using multi-modal ability to directly analyze experimental plots or charts is very valuable to try. Nevertheless, since the most common NAS and NPO algorithms still use the final & best accuracy/loss for analyzing and tuning, we may also keep the possibility that directly using each run's best/final metrics as prompts to build the agent in our very beginning PoC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants