-
Notifications
You must be signed in to change notification settings - Fork 295
wget in node drainer skip has stopped working #1125
Comments
Thanks for reporting! |
@c-knowles @mumoshu How can I solve this? I am new to K8s 😄 |
@c-knowles @mumoshu This happens for me on new nodes or nodes that recently pulled the aws-cli container image. bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
wget: error getting response
bash-4.3# apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.2-248-g3f8eeb3ea1 [http://dl-cdn.alpinelinux.org/alpine/v3.6/main]
v3.6.2-250-g51a3714b5e [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
OK: 8437 distinct packages available
bash-4.3# apk add wget
(1/1) Installing wget (1.19.2-r0)
Executing busybox-1.26.2-r9.trigger
OK: 75 MiB in 35 packages
bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
i-0fccf94cf2be4de7a
bash-4.3# |
Not sure if mine is the same issue - if not, I'm happy to raise a separate issue but what I have seen is that from a working k8s cluster that was spun up with kube-aws, where node-drainer is all working fine and they have started fine, IF i change the instance type of worker nodes to the new c5 range (actually i only tried with c5.4xlarge) and apply so that my worker nodes are now the new instance type, the node-drainers now start to The instance type is the ONLY thing I have changed between a working node-drainers on my worker nodes.
Non-Working:
I have also included the node-drainer-asg-status-updater because that is also failing now on the new c5 instance types with logs showing me:
node-drainer also gives me same error message. It does sound like a different problem so let me know if a new issue is required. |
FYI, I just spun up test container on the same worker node (c5.4xlarge) that has the failing node-drainer pod and from that test container, the same wget that fails from node-drainer works ok. |
@vaibhavsingh97 a quick way to resolve is change the wget calls in the controller config to use curl and then update your cluster with that. Longer term I think we should change the default in kube-aws so it does not always pull master. |
Thanks @c-knowles for pointing it out. I will make PR 👍 |
Looks like there is no good alternative tag - https://quay.io/repository/coreos/awscli?tag=latest&tab=tags. @mumoshu perhaps we should build an AWS CLI image ourselves or use a different one from docker hub so we can pinned the version a bit better? |
Yeah, let's build one ourselves |
@c-knowles Just forked coreos/awscli to https://github.com/kube-aws/docker-awscli. Would you mind sending a PR for switching to curl? The docker repo is also available at https://hub.docker.com/r/kubeaws/awscli/ with automated build enabled. |
@mumoshu yeah ok, I will pick this item up now. |
Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.
@mumoshu PR done. I haven't swapped to the new image yet, I see there's some issues on the coreos repo to get some versions pinned. |
Oops!! Looks like I am late 😄 , Any other beginner issue i can solve? |
@vaibhavsingh97 sorry! Lots of good first issues, I'd suggest one of #950, #1085 or #1063. |
@c-knowles Thanks for the PR! I'll take a look soon. Regarding the awscli image pinning, I've just pushed // Btw, it works like this: As soon as a git tag is pushed, the automated build for the image tag with the same value as the git tag is triggered. |
@mumoshu I would happy to submit PR, Can you please redirect to the resources. |
Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.
Extracted from #1105 (comment).
The current node drainer scripting uses wget but it seems to have some problems with 502s:
On a
Running
node drainer pod, I've exec-ed to the pod and done this:Probably related issue in wget in busybox: http://svn.dd-wrt.com/ticket/5771
The text was updated successfully, but these errors were encountered: