-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to controller-runtime logger in mpi job controller #2177
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 11405323053Details
💛 - Coveralls |
89ae18a
to
00b6c51
Compare
Signed-off-by: champon1020 <[email protected]>
00b6c51
to
bbc9872
Compare
@@ -74,7 +72,7 @@ func NewReconciler(mgr manager.Manager, gangSchedulingSetupFunc common.GangSched | |||
Scheme: mgr.GetScheme(), | |||
recorder: mgr.GetEventRecorderFor(controllerName), | |||
apiReader: mgr.GetAPIReader(), | |||
Log: log.Log, | |||
Log: log.Log.WithName(controllerName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: attach controllerName as the logger name
}() | ||
|
||
mpiJob = mpiJob.DeepCopy() | ||
mpiJob.Status = *jobStatus.DeepCopy() | ||
|
||
result := jc.Status().Update(context.Background(), mpiJob) | ||
|
||
if result != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I think this if statement may be no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. WDYT @kubeflow/wg-training-leads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can remove it.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM :)
I left a few comments for your reference.
return ctrl.Result{}, err | ||
} | ||
|
||
t, err := util.DurationUntilExpireTime(&mpijob.Spec.RunPolicy, mpijob.Status) | ||
if err != nil { | ||
logrus.Warnf("Reconcile MPIJob Job error %v", err) | ||
logger.Info("Reconcile MPIJob Job error", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to replace it with logger.Error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Electronic-Waste fixed (c6de073)
}() | ||
|
||
mpiJob = mpiJob.DeepCopy() | ||
mpiJob.Status = *jobStatus.DeepCopy() | ||
|
||
result := jc.Status().Update(context.Background(), mpiJob) | ||
|
||
if result != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. WDYT @kubeflow/wg-training-leads
Signed-off-by: champon1020 <[email protected]>
4746ae5
to
c6de073
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @champon1020!
I left a few comments.
@@ -25,7 +25,6 @@ import ( | |||
"time" | |||
|
|||
"github.com/go-logr/logr" | |||
"github.com/sirupsen/logrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get a chance to remove logrus from other parts of Training Operator:
"github.com/sirupsen/logrus" |
That will allow us to remove that dependency from go.mod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this comment from you @champon1020: #2048 (comment).
Do you want to work on migration for other files in the separate PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm going to create another PR to do migration in other controllers.
@@ -125,8 +123,7 @@ type MPIJobReconciler struct { | |||
// Reconcile is part of the main kubernetes reconciliation loop which aims to | |||
// move the current state of the cluster closer to the desired state. | |||
func (jc *MPIJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | |||
_ = log.FromContext(ctx) | |||
logger := jc.Log.WithValues(kubeflowv1.MPIJobSingular, req.NamespacedName) | |||
logger := jc.Log.WithValues("namespace", req.NamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get the logger from the context and augment it with additional info from the reconciler's request with labels:
log := ctrl.LoggerFrom(ctx).WithValues(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change it ?
I think the MPIJobSingular, whose value is "mpijob", is not clear as the key of NamespacedName.
Moreover, I attached the controllerName as the logger name here (#2177 (comment)).
Are there any reasons that MPIJobSingular was used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get the logger from the context and augment it with additional info from the reconciler's request with labels:
To use LoggerFrom
method, we need to modify some methods to receive context.Context. Since it requires much more modifications, I think it is better to work on as another issue.
if err := jc.Delete(context.Background(), mpiJob); err != nil { | ||
logger.Error(err, "failed to delete job") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Error(err, "failed to delete job") | |
logger.Error(err, "failed to delete MPIJob") |
return err | ||
} | ||
|
||
jc.Recorder.Eventf(mpiJob, corev1.EventTypeNormal, SuccessfulDeleteJobReason, "Deleted job: %v", mpiJob.Name) | ||
log.Infof("job %s/%s has been deleted", mpiJob.Namespace, mpiJob.Name) | ||
logger.Info("job has been deleted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Info("job has been deleted") | |
logger.Info("MPIJob has been deleted") |
@@ -573,8 +578,12 @@ func (jc *MPIJobReconciler) UpdateJobStatus(job interface{}, replicas map[kubefl | |||
running := status.Active | |||
failed := status.Failed | |||
|
|||
logrus.Infof("MPIJob=%s, ReplicaType=%s expected=%d, running=%d, succeeded=%d , failed=%d", | |||
mpiJob.Name, rtype, expected, running, succeeded, failed) | |||
logger.Info("replica status", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Info("replica status", | |
logger.Info("MPIJob replica status", |
}() | ||
|
||
mpiJob = mpiJob.DeepCopy() | ||
mpiJob.Status = *jobStatus.DeepCopy() | ||
|
||
result := jc.Status().Update(context.Background(), mpiJob) | ||
|
||
if result != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can remove it.
@@ -125,8 +123,7 @@ type MPIJobReconciler struct { | |||
// Reconcile is part of the main kubernetes reconciliation loop which aims to | |||
// move the current state of the cluster closer to the desired state. | |||
func (jc *MPIJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | |||
_ = log.FromContext(ctx) | |||
logger := jc.Log.WithValues(kubeflowv1.MPIJobSingular, req.NamespacedName) | |||
logger := jc.Log.WithValues("namespace", req.NamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get the logger from the context and augment it with additional info from the reconciler's request with labels:
log := ctrl.LoggerFrom(ctx).WithValues(....)
@@ -164,13 +161,13 @@ func (jc *MPIJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct | |||
// MPIJob needs not service | |||
err = jc.ReconcileJobs(mpijob, mpijob.Spec.MPIReplicaSpecs, mpijob.Status, &mpijob.Spec.RunPolicy) | |||
if err != nil { | |||
logrus.Warnf("Reconcile MPIJob error %v", err) | |||
logger.Info("Reconcile MPIJob error", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to set verbosity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced logger.Info
with logger.Error
as mentioned here (#2177 (comment)), but using verbosity might be a good approach.
Is there any rules to set verbosity? (I don't have any ideas how much verbosity is proper here)
@@ -316,9 +313,11 @@ func (jc *MPIJobReconciler) onOwnerCreateFunc() func(event.CreateEvent) bool { | |||
return true | |||
} | |||
|
|||
logger := jc.Log.WithValues("namespace", mpiJob.Namespace, "name", mpiJob.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this namespace same as reconciler request namespace or different? If it's the same, can we insert the labelled logger into the context which we set while initialising (in line 126) and use it as is? If not, do we want to change the key name to something specific?
What this PR does / why we need it:
As mentioned at #2048, we should migrate to Controller Runtime Logger while there are some types of logger currently. First, I modified it only in the MPI Job Controller since I want to determine a course of action. (I'm going to implement the remaining modifications in the next PR)
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #2048
Checklist: