-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make container startup timeout configurable #65
Conversation
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
I think that these changes are working as intended, but I'm struggling to test them. I tested that the configuration value is getting all the way down to the place where it needs to be by adding a print statement, like this: return fmt.Errorf("-- timeoutSeconds: %d", timeoutSeconds)
if time.Since(startTime) > (time.Second * time.Duration(timeoutSeconds)) {
... And then doing this: make clean testprep build
bin/canary validate \
--file ./examples/kubeflow.yaml \
--startup-timeout 5 \
container-canary/kubeflow:shouldpass
bin/canary validate \
--file ./examples/kubeflow.yaml \
--startup-timeout 9 \
container-canary/kubeflow:shouldpass
So this is at least as correct as the existing code. But I'm not sure how to actually test the enforcement of the timeout. This startup timeout applies to the time between when
and when the container is first reported as "running", here:
I'm not sure how to easily influence the amount of time between those two states. That does not include any of these:
The only things I can think of that cause this startup timeout to be exceeded are like this:
@jacobtomlinson @KyleFromNVIDIA can you think of a good way to reliably test this? Or should we just merge this as-is without testing changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to test this, my suggestion would be to make some sort of mock interface that can wait any desired period of time for testing purposes. This can be either a Go interface or a shim command that gets substituted in place of docker
, or something else.
That said, this is a small, self-explanatory change, and I don't think it strictly has to be tested.
Approving as-is. Feel free to add more robust tests if you want.
Thanks for these ideas, they're good ones I hadn't considered. A shim for I think that's way beyond my Go abilities though 😬 I agree with you about this being small and self-explanatory, and I'm also confident it could just be merged as-is. Let's see what @jacobtomlinson thinks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in the long term we want to improve testing here, and putting more places where we can add shims is a great idea. But I also agree for this change that manual testing seems fine.
Alright sounds good, thanks! |
Fixes #61
Contributes to rapidsai/docker#667
container-canary
runs a container and then executes commands inside it. When it starts up that container, if it doesn't see the container come up within 10 seconds, it raises an exception.This proposes making that timeout configurable, via a CLI argument
--startup-timeout
. This will make it easier to usecontainer-canary
with images / environments where container starting is slower (e.g. where something expensive is done at startup or where local disks are slow).