-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added MLFlowBackend #163
base: master
Are you sure you want to change the base?
added MLFlowBackend #163
Conversation
- name: Setup python and mlflow server | ||
uses: actions/setup-python@v1 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
architecture: ${{ matrix.arch }} | ||
- run: | | ||
python -m pip install mlflow | ||
python -m pip show mlflow | ||
mlflow server --host localhost --port 5000 & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! For my understanding, is this still necessary if MLFlowLogger.jl doesn't use Python? I thought MLFlow logging does not require an active server running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll get a HTTP.ConnectError
when trying to log to a server that's not running. So unfortunately these tests require a running server. Maybe it's more developer friendly to only run the MLFlow tests in the CI such that FluxTraining developers can just do ]test
without having to bother with MLFlow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood what https://github.com/rejuvyesh/MLFlowLogger.jl does and is capable of. I've always used a setup like 1) or 2) in https://www.mlflow.org/docs/latest/tracking.html#common-setups, but it looks like this library only supports 3)? Are there plans to add logging support without a tracking server running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I assumed for setup 1 or 2 there would always be a tracking server running on localhost, but I see now that that does not make sense.
The previous activity in MLFlowLogger.jl was 3 years ago so I don't think there are any plans there. I'm personally mainly interested in setup 3 because I am collaborating with others on a Julia ML project, but I can invest a little time if needed.
I assume the way to go would be to add file logging functionality to MLFlowLogger.jl, similar to what was done for TensorBoardLogger.jl. Adding all file logging functionality is probably a larger effort, but I could work on a first version to at least support creating experiments, runs and log_metric().
What kind of roadmap do you envision to add MLFlow logging support in FluxTraining.jl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally had some time to look into this further. If we're already including Python, I wonder if it would help to use the more actively maintained https://github.com/JuliaAI/MLJFlow.jl or underlying https://github.com/JuliaAI/MLFlowClient.jl? If that doesn't sound appealing, we can continue with this approach.
I would also consider whether this could be implemented as a package extension. If you're comfortable with trying that, please do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in that case I would also use the underlying MLFlowClient.jl directly.
I like the idea of MLFlowBackend supporting both use cases: use MLFlowLogger.jl to log locally and MLFlowClient.jl to log to a remote MLFlow server.
I sketched an overview of such a design here. What do you think? (I would have to change MLFlowLogger.jl to a local logger, which makes sense since then it's similar to TensorBoardLogger.jl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. I realize adding local logging support could be quite a bit of work though, and I hadn't realized you already got MLFlow CI working with rejuvyesh/MLFlowLogger.jl#5. So if the local logging part turns out to be too much of a hassle, I'd be ok continuing with the current PR setup and revisiting local logging once the MLFlow client library in question supports it.
…e mlflow connection error in the Windows CI
I've added a
MLFlowBackend
type for logging to MLFlow, similar to theTensorBoardBackend
. It uses theMLFLogger
from MLFlowLogger.jl (which now uses the REST API).Currently, the
log_to()
method is implemented forLoggables.Value
. I can see if I can add log methods for the other typesLoggables.Image
,Loggables.Text
,Loggables.Histogram
later, just wanted to get some first feedback.I also still have to figure out how to start the mlflow server in the CI on Windows to make all tests pass.
Any first feedback?
PR Checklist
LogMetrics