Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added MLFlowBackend #163

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,25 @@ jobs:
- windows-latest
arch:
- x64
python-version: ['3.10']
steps:
- uses: actions/checkout@v3
- uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- name: Setup python and mlflow server
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
architecture: ${{ matrix.arch }}
- run: |
python -m pip install mlflow
python -m pip show mlflow
mlflow server --host localhost --port 5000 &
Comment on lines +27 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! For my understanding, is this still necessary if MLFlowLogger.jl doesn't use Python? I thought MLFlow logging does not require an active server running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get a HTTP.ConnectError when trying to log to a server that's not running. So unfortunately these tests require a running server. Maybe it's more developer friendly to only run the MLFlow tests in the CI such that FluxTraining developers can just do ]test without having to bother with MLFlow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood what https://github.com/rejuvyesh/MLFlowLogger.jl does and is capable of. I've always used a setup like 1) or 2) in https://www.mlflow.org/docs/latest/tracking.html#common-setups, but it looks like this library only supports 3)? Are there plans to add logging support without a tracking server running?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I assumed for setup 1 or 2 there would always be a tracking server running on localhost, but I see now that that does not make sense.

The previous activity in MLFlowLogger.jl was 3 years ago so I don't think there are any plans there. I'm personally mainly interested in setup 3 because I am collaborating with others on a Julia ML project, but I can invest a little time if needed.

I assume the way to go would be to add file logging functionality to MLFlowLogger.jl, similar to what was done for TensorBoardLogger.jl. Adding all file logging functionality is probably a larger effort, but I could work on a first version to at least support creating experiments, runs and log_metric().

What kind of roadmap do you envision to add MLFlow logging support in FluxTraining.jl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally had some time to look into this further. If we're already including Python, I wonder if it would help to use the more actively maintained https://github.com/JuliaAI/MLJFlow.jl or underlying https://github.com/JuliaAI/MLFlowClient.jl? If that doesn't sound appealing, we can continue with this approach.

I would also consider whether this could be implemented as a package extension. If you're comfortable with trying that, please do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in that case I would also use the underlying MLFlowClient.jl directly.

I like the idea of MLFlowBackend supporting both use cases: use MLFlowLogger.jl to log locally and MLFlowClient.jl to log to a remote MLFlow server.

I sketched an overview of such a design here. What do you think? (I would have to change MLFlowLogger.jl to a local logger, which makes sense since then it's similar to TensorBoardLogger.jl).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I realize adding local logging support could be quite a bit of work though, and I hadn't realized you already got MLFlow CI working with rejuvyesh/MLFlowLogger.jl#5. So if the local logging part turns out to be too much of a hassle, I'd be ok continuing with the current PR setup and revisiting local logging once the MLFlow client library in question supports it.

sleep 5
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-runtest@latest
env:
MLFLOW_URI: "http://localhost:5000"

1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
ImageCore = "a09fc81d-aa75-5fe9-8630-4744c3626534"
InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"
MLFlowLogger = "a17d1b34-d2df-4d9e-9e11-6289e57bd259"
OnlineStats = "a15396b6-48d5-5d58-9928-6d29437db91e"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
ParameterSchedulers = "d7d3b36b-41b8-4d0d-a2bf-768c6151755e"
Expand Down
2 changes: 1 addition & 1 deletion docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ For logging, use the logging callbacks:
- [`LogHyperParams`](#)
- [`LogHistograms`](#)

They each can have multiple logging backends, but right now the only one implemented in *FluxTraining.jl* is [`TensorBoardBackend`](#). See also [`LoggerBackend`](#), [`log_to`](#), and [`Loggables.Loggable`](#).
They each can have multiple logging backends, but right now the only ones implemented in *FluxTraining.jl* are [`TensorBoardBackend`](#) and [`MLFlowBackend`](#). See also [`LoggerBackend`](#), [`log_to`](#), and [`Loggables.Loggable`](#).

There is also an external package [Wandb.jl](https://github.com/avik-pal/Wandb.jl) that implements a logging backend for [Weights&Biases](https://wandb.ai).

Expand Down
3 changes: 3 additions & 0 deletions src/FluxTraining.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ include("./callbacks/execution.jl")
# logging
include("./callbacks/logging/Loggables.jl")
include("./callbacks/logging/logger.jl")
include("./callbacks/logging/combinename.jl")
include("./callbacks/logging/mlflow.jl")
include("./callbacks/logging/tensorboard.jl")
include("./callbacks/logging/checkpointer.jl")

Expand Down Expand Up @@ -111,6 +113,7 @@ export AbstractCallback,
LogHyperParams,
LogVisualization,
TensorBoardBackend,
MLFlowBackend,
StopOnNaNLoss,
LearningRate,
throttle,
Expand Down
3 changes: 3 additions & 0 deletions src/callbacks/logging/combinename.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
_combinename(name, group::String) = _combinename((group, name))
_combinename(name, group::Tuple) = _combinename((group..., name))
_combinename(strings::Tuple) = join(strings, '/')
29 changes: 29 additions & 0 deletions src/callbacks/logging/mlflow.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
using MLFlowLogger: MLFLogger, log_metric

"""
MLFlowBackend(;
tracking_uri=nothing,
experiment_name=nothing,
run_id=nothing,
start_step=0,
step_increment=1,
min_level=CoreLogging.Info,
kwargs...)
MLFlow backend for logging callbacks. Takes the same arguments
as [`MLFlowLogger.MLFlowLogger`](https://github.com/rejuvyesh/MLFlowLogger.jl/blob/master/src/MLFlowLogger.jl).
"""
struct MLFlowBackend <: LoggerBackend
logger::MLFLogger

function MLFlowBackend(; kwargs...)
return new(MLFLogger(; kwargs...))
end
end

Base.show(io::IO, backend::MLFlowBackend) = print(
io, "MLFlowBackend(", backend.logger, ")")

function log_to(backend::MLFlowBackend, value::Loggables.Value, name, i; group = ())
name = _combinename(name, group)
log_metric(backend.logger, name, value.data; step = i)
end
6 changes: 0 additions & 6 deletions src/callbacks/logging/tensorboard.jl
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,3 @@ function log_to(backend::TensorBoardBackend, hist::Loggables.Histogram, name, i;
name = _combinename(name, group)
log_histogram(backend.logger, name, hist.data, step=i)
end

# Utilities

_combinename(name, group::String) = _combinename((group, name))
_combinename(name, group::Tuple) = _combinename((group..., name))
_combinename(strings::Tuple) = join(strings, '/')
5 changes: 5 additions & 0 deletions test/callbacks/logging.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
include("../imports.jl")

tbbackend() = TensorBoardBackend(mktempdir())
mlflowbackend() = MLFlowBackend(tracking_uri=ENV["MLFLOW_URI"])

@testset "`LogMetrics`" begin
cb = LogMetrics(tbbackend())
learner = testlearner(Metrics(accuracy), Recorder(), cb)
@test_nowarn fit!(learner, 1)

cb = LogMetrics(mlflowbackend())
learner = testlearner(Metrics(accuracy), Recorder(), cb)
@test_nowarn fit!(learner, 1)
end


Expand Down
Loading