wget in node drainer skip has stopped working #1125

cknowles · 2018-02-06T06:20:52Z

The current node drainer scripting uses wget but it seems to have some problems with 502s:

wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
wget: server returned error: HTTP/1.1 502 Bad Gateway

On a Running node drainer pod, I've exec-ed to the pod and done this:

/ # wget -O - -q http://169.254.169.254/latest/meta-data/
wget: server returned error: HTTP/1.1 502 Bad Gateway

/ # curl http://169.254.169.254/latest/meta-data/
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
hostname
[...]

Probably related issue in wget in busybox: http://svn.dd-wrt.com/ticket/5771

The text was updated successfully, but these errors were encountered:

mumoshu · 2018-02-06T07:20:22Z

Thanks for reporting!
Not sure this is affecting everyone's cluster, but let's improve node drainer to use curl instead and see if it works, according to our observation in the referenced issue.

vaibhavsingh97 · 2018-02-07T16:07:04Z

@c-knowles @mumoshu How can I solve this? I am new to K8s 😄

camilb · 2018-02-07T21:20:53Z

@c-knowles @mumoshu This happens for me on new nodes or nodes that recently pulled the aws-cli container image.

bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
wget: error getting response
bash-4.3# apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.2-248-g3f8eeb3ea1 [http://dl-cdn.alpinelinux.org/alpine/v3.6/main]
v3.6.2-250-g51a3714b5e [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
OK: 8437 distinct packages available
bash-4.3# apk add wget
(1/1) Installing wget (1.19.2-r0)
Executing busybox-1.26.2-r9.trigger
OK: 75 MiB in 35 packages
bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
i-0fccf94cf2be4de7a
bash-4.3#

kiich · 2018-02-07T22:09:19Z

Not sure if mine is the same issue - if not, I'm happy to raise a separate issue but what I have seen is that from a working k8s cluster that was spun up with kube-aws, where node-drainer is all working fine and they have started fine, IF i change the instance type of worker nodes to the new c5 range (actually i only tried with c5.4xlarge) and apply so that my worker nodes are now the new instance type, the node-drainers now start to CrashLoopBackOff.

The instance type is the ONLY thing I have changed between a working node-drainers on my worker nodes.
i.e.
Working:

kube-system           kube-node-drainer-ds-6xd9r                                           1/1       Running             0          8m
kube-system           kube-node-drainer-ds-kbpgv                                           1/1       Running             0          8m
kube-system           kube-node-drainer-ds-kmr9r                                           1/1       Running             0          8m

Non-Working:

kube-system           kube-node-drainer-asg-status-updater-f9f67c9c7-w7gwg                 0/1       CrashLoopBackOff    5          5m
kube-system           kube-node-drainer-ds-58vnq                                           0/1       CrashLoopBackOff    6          8m
kube-system           kube-node-drainer-ds-qlg76                                           0/1       CrashLoopBackOff    6          8m
kube-system           kube-node-drainer-ds-scsxl                                           0/1       CrashLoopBackOff    6          8m

I have also included the node-drainer-asg-status-updater because that is also failing now on the new c5 instance types with logs showing me:

kubectl logs kube-node-drainer-asg-status-updater-f9f67c9c7-w7gwg -n kube-system
+ metadata dynamic/instance-identity/document
+ wget -O - -q http://169.254.169.254/2016-09-02/dynamic/instance-identity/document
+ jq -r .region
wget: error getting response
+ REGION=
+ [ -n  ]

node-drainer also gives me same error message.

It does sound like a different problem so let me know if a new issue is required.
I have found coreos/bugs#2331 which is kind of related.

kiich · 2018-02-07T22:46:55Z

FYI, I just spun up test container on the same worker node (c5.4xlarge) that has the failing node-drainer pod and from that test container, the same wget that fails from node-drainer works ok.

cknowles · 2018-02-08T05:21:32Z

@vaibhavsingh97 a quick way to resolve is change the wget calls in the controller config to use curl and then update your cluster with that. Longer term I think we should change the default in kube-aws so it does not always pull master.

vaibhavsingh97 · 2018-02-08T05:34:38Z

Thanks @c-knowles for pointing it out. I will make PR 👍

cknowles · 2018-02-08T05:43:46Z

Looks like there is no good alternative tag - https://quay.io/repository/coreos/awscli?tag=latest&tab=tags. @mumoshu perhaps we should build an AWS CLI image ourselves or use a different one from docker hub so we can pinned the version a bit better?

mumoshu · 2018-02-08T06:53:46Z

Yeah, let's build one ourselves

mumoshu · 2018-02-08T07:46:21Z

@c-knowles Just forked coreos/awscli to https://github.com/kube-aws/docker-awscli.

Would you mind sending a PR for switching to curl?

The docker repo is also available at https://hub.docker.com/r/kubeaws/awscli/ with automated build enabled.

cknowles · 2018-02-08T12:17:53Z

@mumoshu yeah ok, I will pick this item up now.

Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.

cknowles · 2018-02-08T12:52:25Z

@mumoshu PR done. I haven't swapped to the new image yet, I see there's some issues on the coreos repo to get some versions pinned.

vaibhavsingh97 · 2018-02-08T12:58:26Z

Oops!! Looks like I am late 😄 , Any other beginner issue i can solve?

cknowles · 2018-02-08T13:09:09Z

@vaibhavsingh97 sorry! Lots of good first issues, I'd suggest one of #950, #1085 or #1063.

mumoshu · 2018-02-08T14:26:35Z

@c-knowles Thanks for the PR! I'll take a look soon.

Regarding the awscli image pinning, I've just pushed kubeaws/awscli:0.9.0 via automated-build. I'd appreciate it if you could submit PRs to change awscli used by kube-aws to that one! @vaibhavsingh97 @c-knowles

// Btw, it works like this: As soon as a git tag is pushed, the automated build for the image tag with the same value as the git tag is triggered.

vaibhavsingh97 · 2018-02-18T07:22:00Z

@mumoshu I would happy to submit PR, Can you please redirect to the resources.

Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.

mumoshu added this to the v0.9.10.rc-1 milestone Feb 6, 2018

mumoshu added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Feb 6, 2018

cknowles self-assigned this Feb 8, 2018

cknowles pushed a commit to cknowles/kube-aws that referenced this issue Feb 8, 2018

Swap out wget for curl in node drainer scripts

fef773e

Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.

cknowles mentioned this issue Feb 8, 2018

Swap out wget for curl in node drainer scripts #1127

Merged

mumoshu closed this as completed in #1127 Feb 9, 2018

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018

Swap out wget for curl in node drainer scripts

51f030d

Resolves kubernetes-retired#1125. Also both curl and wget used within these scripts, better to use just one of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wget in node drainer skip has stopped working #1125

wget in node drainer skip has stopped working #1125

cknowles commented Feb 6, 2018

mumoshu commented Feb 6, 2018

vaibhavsingh97 commented Feb 7, 2018

camilb commented Feb 7, 2018

kiich commented Feb 7, 2018

kiich commented Feb 7, 2018

cknowles commented Feb 8, 2018 •

edited

Loading

vaibhavsingh97 commented Feb 8, 2018

cknowles commented Feb 8, 2018

mumoshu commented Feb 8, 2018

mumoshu commented Feb 8, 2018

cknowles commented Feb 8, 2018

cknowles commented Feb 8, 2018

vaibhavsingh97 commented Feb 8, 2018

cknowles commented Feb 8, 2018

mumoshu commented Feb 8, 2018

vaibhavsingh97 commented Feb 18, 2018

wget in node drainer skip has stopped working #1125

wget in node drainer skip has stopped working #1125

Comments

cknowles commented Feb 6, 2018

mumoshu commented Feb 6, 2018

vaibhavsingh97 commented Feb 7, 2018

camilb commented Feb 7, 2018

kiich commented Feb 7, 2018

kiich commented Feb 7, 2018

cknowles commented Feb 8, 2018 • edited Loading

vaibhavsingh97 commented Feb 8, 2018

cknowles commented Feb 8, 2018

mumoshu commented Feb 8, 2018

mumoshu commented Feb 8, 2018

cknowles commented Feb 8, 2018

cknowles commented Feb 8, 2018

vaibhavsingh97 commented Feb 8, 2018

cknowles commented Feb 8, 2018

mumoshu commented Feb 8, 2018

vaibhavsingh97 commented Feb 18, 2018

cknowles commented Feb 8, 2018 •

edited

Loading