Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

wget in node drainer skip has stopped working #1125

Closed
cknowles opened this issue Feb 6, 2018 · 16 comments
Closed

wget in node drainer skip has stopped working #1125

cknowles opened this issue Feb 6, 2018 · 16 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@cknowles
Copy link
Contributor

cknowles commented Feb 6, 2018

Extracted from #1105 (comment).

The current node drainer scripting uses wget but it seems to have some problems with 502s:

wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
wget: server returned error: HTTP/1.1 502 Bad Gateway 

On a Running node drainer pod, I've exec-ed to the pod and done this:

/ # wget -O - -q http://169.254.169.254/latest/meta-data/
wget: server returned error: HTTP/1.1 502 Bad Gateway

/ # curl http://169.254.169.254/latest/meta-data/
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
hostname
[...]

Probably related issue in wget in busybox: http://svn.dd-wrt.com/ticket/5771

@mumoshu mumoshu added this to the v0.9.10.rc-1 milestone Feb 6, 2018
@mumoshu
Copy link
Contributor

mumoshu commented Feb 6, 2018

Thanks for reporting!
Not sure this is affecting everyone's cluster, but let's improve node drainer to use curl instead and see if it works, according to our observation in the referenced issue.

@mumoshu mumoshu added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Feb 6, 2018
@vaibhavsingh97
Copy link

@c-knowles @mumoshu How can I solve this? I am new to K8s 😄

@camilb
Copy link
Contributor

camilb commented Feb 7, 2018

@c-knowles @mumoshu This happens for me on new nodes or nodes that recently pulled the aws-cli container image.

bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
wget: error getting response
bash-4.3# apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.2-248-g3f8eeb3ea1 [http://dl-cdn.alpinelinux.org/alpine/v3.6/main]
v3.6.2-250-g51a3714b5e [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
OK: 8437 distinct packages available
bash-4.3# apk add wget
(1/1) Installing wget (1.19.2-r0)
Executing busybox-1.26.2-r9.trigger
OK: 75 MiB in 35 packages
bash-4.3# wget -O - -q http://169.254.169.254/2016-09-02/meta-data/instance-id
i-0fccf94cf2be4de7a
bash-4.3#

@kiich
Copy link
Contributor

kiich commented Feb 7, 2018

Not sure if mine is the same issue - if not, I'm happy to raise a separate issue but what I have seen is that from a working k8s cluster that was spun up with kube-aws, where node-drainer is all working fine and they have started fine, IF i change the instance type of worker nodes to the new c5 range (actually i only tried with c5.4xlarge) and apply so that my worker nodes are now the new instance type, the node-drainers now start to CrashLoopBackOff.

The instance type is the ONLY thing I have changed between a working node-drainers on my worker nodes.
i.e.
Working:

kube-system           kube-node-drainer-ds-6xd9r                                           1/1       Running             0          8m
kube-system           kube-node-drainer-ds-kbpgv                                           1/1       Running             0          8m
kube-system           kube-node-drainer-ds-kmr9r                                           1/1       Running             0          8m

Non-Working:

kube-system           kube-node-drainer-asg-status-updater-f9f67c9c7-w7gwg                 0/1       CrashLoopBackOff    5          5m
kube-system           kube-node-drainer-ds-58vnq                                           0/1       CrashLoopBackOff    6          8m
kube-system           kube-node-drainer-ds-qlg76                                           0/1       CrashLoopBackOff    6          8m
kube-system           kube-node-drainer-ds-scsxl                                           0/1       CrashLoopBackOff    6          8m

I have also included the node-drainer-asg-status-updater because that is also failing now on the new c5 instance types with logs showing me:

kubectl logs kube-node-drainer-asg-status-updater-f9f67c9c7-w7gwg -n kube-system
+ metadata dynamic/instance-identity/document
+ wget -O - -q http://169.254.169.254/2016-09-02/dynamic/instance-identity/document
+ jq -r .region
wget: error getting response
+ REGION=
+ [ -n  ]

node-drainer also gives me same error message.

It does sound like a different problem so let me know if a new issue is required.
I have found coreos/bugs#2331 which is kind of related.

@kiich
Copy link
Contributor

kiich commented Feb 7, 2018

FYI, I just spun up test container on the same worker node (c5.4xlarge) that has the failing node-drainer pod and from that test container, the same wget that fails from node-drainer works ok.

@cknowles
Copy link
Contributor Author

cknowles commented Feb 8, 2018

@vaibhavsingh97 a quick way to resolve is change the wget calls in the controller config to use curl and then update your cluster with that. Longer term I think we should change the default in kube-aws so it does not always pull master.

@vaibhavsingh97
Copy link

Thanks @c-knowles for pointing it out. I will make PR 👍

@cknowles
Copy link
Contributor Author

cknowles commented Feb 8, 2018

Looks like there is no good alternative tag - https://quay.io/repository/coreos/awscli?tag=latest&tab=tags. @mumoshu perhaps we should build an AWS CLI image ourselves or use a different one from docker hub so we can pinned the version a bit better?

@mumoshu
Copy link
Contributor

mumoshu commented Feb 8, 2018

Yeah, let's build one ourselves

@mumoshu
Copy link
Contributor

mumoshu commented Feb 8, 2018

@c-knowles Just forked coreos/awscli to https://github.com/kube-aws/docker-awscli.

Would you mind sending a PR for switching to curl?

The docker repo is also available at https://hub.docker.com/r/kubeaws/awscli/ with automated build enabled.

@cknowles
Copy link
Contributor Author

cknowles commented Feb 8, 2018

@mumoshu yeah ok, I will pick this item up now.

@cknowles cknowles self-assigned this Feb 8, 2018
cknowles pushed a commit to cknowles/kube-aws that referenced this issue Feb 8, 2018
Resolves kubernetes-retired#1125.

Also both curl and wget used within these scripts, better to use just one of them.
@cknowles
Copy link
Contributor Author

cknowles commented Feb 8, 2018

@mumoshu PR done. I haven't swapped to the new image yet, I see there's some issues on the coreos repo to get some versions pinned.

@vaibhavsingh97
Copy link

Oops!! Looks like I am late 😄 , Any other beginner issue i can solve?

@cknowles
Copy link
Contributor Author

cknowles commented Feb 8, 2018

@vaibhavsingh97 sorry! Lots of good first issues, I'd suggest one of #950, #1085 or #1063.

@mumoshu
Copy link
Contributor

mumoshu commented Feb 8, 2018

@c-knowles Thanks for the PR! I'll take a look soon.

Regarding the awscli image pinning, I've just pushed kubeaws/awscli:0.9.0 via automated-build. I'd appreciate it if you could submit PRs to change awscli used by kube-aws to that one! @vaibhavsingh97 @c-knowles

// Btw, it works like this: As soon as a git tag is pushed, the automated build for the image tag with the same value as the git tag is triggered.

@vaibhavsingh97
Copy link

@mumoshu I would happy to submit PR, Can you please redirect to the resources.

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
Resolves kubernetes-retired#1125.

Also both curl and wget used within these scripts, better to use just one of them.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants