-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods unable to connect to connect to internet when using Minikube with --driver=none option #17442
Comments
owdev-init-couchdb-68zbw
pod could not resolve host: github.comowdev-init-couchdb
pod could not resolve host: github.com
owdev-init-couchdb
pod could not resolve host: github.com
@afbjorklund Any help? I am a bit stuck... |
I have the same problem. It does not seem to be a DNS issue but a issue with the network routing. Seems access to IP addresses outside of the pod network is not working. I am guessing that the hostname look up failures is because the pod cannot access the DNS server specified in the /etc/resolv.conf. For me the pod network is 10.244.0.0/16 and the DNS is specified as 10.96.0.10 which is a different subnet. From the pod
|
Given you are using Ubuntu kubeadm detects/tries to configure Depending on your local setups you may not be using systemd-resolved. On the host machine running minikube can you run and provide the output of the following so I can better understand your host machine DNS configuration
@megazone23 - the different IPs for the service cube-dns is misleading due to the way services operate using a virtual IP. @Abhishekghosh1998 and @megazone23 there are some general debugging tips on Kubernetes DNS using a dnsutils pod https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ that might help you in your debugging. I'll try setup an ubuntu box to test locally |
@pnasrat Here are the outputs which you asked for: $ systemctl status --no-pager systemd-resolved -l
β systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-10-18 23:16:23 IST; 4 days ago
Docs: man:systemd-resolved.service(8)
man:org.freedesktop.resolve1(5)
https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
Main PID: 863 (systemd-resolve)
Status: "Processing requests..."
Tasks: 1 (limit: 76849)
Memory: 9.7M
CPU: 53.929s
CGroup: /system.slice/systemd-resolved.service
ββ863 /lib/systemd/systemd-resolved
Oct 18 23:16:23 abhishek systemd[1]: Starting Network Name Resolution...
Oct 18 23:16:23 abhishek systemd-resolved[863]: Positive Trust Anchors:
Oct 18 23:16:23 abhishek systemd-resolved[863]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Oct 18 23:16:23 abhishek systemd-resolved[863]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa corp home internal intranet lan local private test
Oct 18 23:16:23 abhishek systemd-resolved[863]: Using system hostname 'abhishek'.
Oct 18 23:16:23 abhishek systemd[1]: Started Network Name Resolution.
Oct 18 23:16:27 abhishek systemd-resolved[863]: enp0s31f6: Bus client set default route setting: yes
Oct 18 23:16:27 abhishek systemd-resolved[863]: enp0s31f6: Bus client set DNS server list to: 10.16.25.15, 10.15.25.13
Oct 18 23:16:36 abhishek systemd-resolved[863]: enp0s31f6: Bus client set DNS server list to: 10.16.25.15, 10.15.25.13, fe80::1
Oct 21 17:40:46 abhishek systemd-resolved[863]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.15.25.13. $ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Dec 21 2022 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf $ ls -l /run/systemd/resolve/
total 8
srw-rw-rw- 1 systemd-resolve systemd-resolve 0 Oct 18 23:16 io.systemd.Resolve
drwx------ 2 systemd-resolve systemd-resolve 60 Oct 18 23:16 netif
-rw-r--r-- 1 systemd-resolve systemd-resolve 830 Oct 18 23:16 resolv.conf
-rw-r--r-- 1 systemd-resolve systemd-resolve 920 Oct 18 23:16 stub-resolv.conf $ cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 10.16.25.15
nameserver 10.15.25.13
nameserver fe80::1%2
search .
I might be wrong, but my system seems to be using |
Yes you are using
So something networking - let's check this name server 10.16.25.15 by using the dnsutils pod described in the debugging link earlier. The below runs a few commands in it to check if DNS to one of Google's Public DNS nameservers 8.8.8.8 works then runs the same lookup using the local name server both locally (first two commands) and then in the minikube server. If you can provide that output we can see if it is networking issue that impacts external DNS server or if it might be something related to the networking between container - host -
Also need to eliminate host firewalling - on the host can you run the below and add the iptables.out as an attached file
As well as sharing the output of |
My environment is different. I am using RHEL 8.8 and I am not using sysetmd-resolve. But for me I am pretty sure it is not a DNS issue it is network related as you mentioned and I think it is specific to using driver=none. Because I have the same behavior as this issue. If do not use the driver=none I can properly access the internet and any local machine on my local network. But if I run driver=none then all network connectivity outside the pod is timing out. I tried pinging using IP to bypass DSN. From the machine hosting minikube to another machine on the local network $ ping -c 3 10.253.16.228 --- 10.253.16.228 ping statistics --- from the dnsutils pod to same machine on the local network root@dnsutils:/$ ping -c 5 10.253.16.228 --- 10.253.16.228 ping statistics --- from the dnsutils pod to the host hosting minikube root@dnsutils:/$ ping -c 3 10.253.16.47 Could it be something related to the iptables? The first one has no output The second one has a lot of output. I'll attach mine. |
@megazone23 looking at your iptables you have multiple CNI bridges. Can you attach minikube logs as this is likely a different issue. Can you share what's in your CNI config as I see multiple bridges in the iptables output.
Minikube will create a bridge minikube/pkg/minikube/cni/cni.go Line 252 in f29457b
Once you've uploaded the minikube logs my suggestion to eliminate iptables itself, if you are comfortable with, would be to delete minikube, remove the minikube generated cni bridge config, turn off firewalling then rerun
|
|
I ran your suggestion disabling firewalld and clearing iptables The problem is still there... Ping to the minikube host machine from the pod works
Ping to machines on the same subnet as the host machine from the pod fails
DNS lookups in the pod using coredns as the dns server fails connecting to 10.96.0.10
|
From the pod the default route is set to 10.244.0.1, this is pingable.
It seems like 10.244.0.1 is not routing the packets. |
Hmm @megazone23 a few things in the logs stand out to me. R Firstly the name server that coredns is trying to speak to is an address in the link local address block which I don't think would work. Rather than trying ping (as ICMP can be filtered by other things) can you try DNS explicitlly
|
@megazone23 can you also confirm if forwarding is setup as root |
I am running a compute machine in Google Cloud. External DNS servers such as 8.8.8.8 are blocked. The DNS for the host VM is using the google metadata DNS server. I could try to switch to a DNS server which is on the corporate network. Host resolv.conf
|
Switching to the corporate DNS looks like has the same problem are there some address ranges which are not useable?
|
Does |
Yes
|
Running the same command from the dnsutils pod fails.
|
Ok so the name server is only part of the problem - I believe (but will need to check) that as the coredns pod link is a virtual ethernet device the the link local address - I can replicate that initial failing with the 169 network setting up a reel instance in GCP along with failing to query 8.8.8.8 which works from the host. Now I've an environment that reproducesI'll do a bit of debugging on the network side to see if I can see what's preventing the pods out of the box which will be a bit quicker turnaround than asking you to run commands https://gist.github.com/pnasrat/96612d4cf7670232e38bd8645e527862 |
Added some iptables logging as the forward chain seemed to be getting drops
|
Thanks for the help so far. I assume the original reporter of this issue is having the same problem. Which seems to be network related. |
@megazone23 can you see if adding the following iptables rules for the bridge interface allow the pod to talk to the network. I'm pretty sure it's the FORWARD chain that is denying the pod networking work around and this will temporarily work around while I get a better understanding of what the rules should be, what the security implications are, and what might need adjusting to fix correctly. As root on the host
With this I can dig using the dnsutils pod
|
/kind support |
The underlying cause here seems to be that docker libnetwork sets the If using docker + cri-dockerd + minikube none-driver this is probably a common issue. There are a number of workarounds, but the none driver is explicitly documented for advanced users and unlike the minikube iso images the variation of configurations is large. I am happy to add some additional documentation/links about network debugging in general, and noting this potential issue explicitly for the none driver. |
@pnasrat You are a genius! I ran the two iptables commands and the network issue is corrected DNS lookups are working and ping is also successful. Thanks! |
iptables -A FORWARD -o bridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i bridge -j ACCEPT After trying out the above two commands, I was able to resolve the issue with Trying to create a pod for example the $ kubectl get pods
NAME READY STATUS RESTARTS AGE
dnsutils 0/1 ContainerCreating 0 5m2s The system remains stuck at the $ kubectl describe pod dnsutils
Name: dnsutils
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Wed, 25 Oct 2023 16:34:12 +0530
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
dnsutils:
Container ID:
Image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
Image ID:
Port: <none>
Host Port: <none>
Command:
sleep
infinity
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lkp4r (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-lkp4r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m6s default-scheduler Successfully assigned default/dnsutils to minikube
Normal Pulling 6m6s kubelet Pulling image "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3" I guess something bad happened. I tried removing the above two rules: iptables -D FORWARD -o bridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -D FORWARD -i bridge -j ACCEPT But still no luck. $ kubectl get pods
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 4m37s The pod starts, but it takes a lot of time to come into the |
@Abhishekghosh1998 my personal recommendation would be to use One way to reset your ubuntu firewall (do this after minikube stop and delete) is to stop docker and ufw to prevent manipulation of firewall rules, flush the rules, reset the default FORWARD policy, zero the iptables counters. This may impact other software that manipulates iptables rules (eg libvirt). Firewalll reset
As you are the sysadmin of your server I'm not sure if you want to run with ufw or not but that's your decision. Reset CNI configs that minikube setup
Start docker and minikube
Then rerun |
@pnasrat Please can you help me once. Even after trying the above methods of firewall reset, I get issue that the container creation takes a long time, which wasn't there before, when using the
The logs contain this error. Also while trying to execute the below command, I get the following: for f in /etc/cni/net.d/*.mk_disabled; do sudo mv "${f}" "${f%%.mk_disabled}" ; done
mv: cannot stat '/etc/cni/net.d/*.mk_disabled': No such file or directory |
@Abhishekghosh1998 are you running
|
@pnasrat yes, I am running |
@Abhishekghosh1998 please follow the above steps to purge your minikube profiles and reset the network and your user minikube profile |
But still in the logs I find the above error:
I have tried the steps which you said in #17442 (comment) Why is that error coming? Pulling the docker images for the first time seems to be taking a lot of time... |
So long as the minikube is up and running, and docker images are pulling and eventually running (the slowness is likely just as it needs to pull images off the internet) I don't think there is necessarily anything wrong the error is what happens when there is no I see this on my ubuntu system I did
If I then run
Triage notesminikube logs section with error than creation``` I1025 10:18:08.120728 190591 cli_runner.go:164] Run: docker network inspect minikube --format "{"Name": "{{.Name}}","Driver": "{{.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docker.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Address}}",{{end}}]}" W1025 10:18:08.131898 190591 cli_runner.go:211] docker network inspect minikube --format "{"Name": "{{.Name}}","Driver": "{{.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docker.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Address}}",{{end}}]}" returned with exit code 1 I1025 10:18:08.131935 190591 network_create.go:281] running [docker network inspect minikube] to gather additional debugging logs... I1025 10:18:08.131940 190591 cli_runner.go:164] Run: docker network inspect minikube W1025 10:18:08.140259 190591 cli_runner.go:211] docker network inspect minikube returned with exit code 1 I1025 10:18:08.140269 190591 network_create.go:284] error running [docker network inspect minikube]: docker network inspect minikube: exit status 1 stdout: [] stderr: -- /stdout -- ** /stderr **
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What Happened?
For some specific use cases, I want to start Minikube with the
--driver=none
option.All the pods are running fine:
Then I wrote a simple ubuntu-pod to check internet access
ubuntu-pod.yml:
Then I try to launch the terminal of that pod and try to check internet connectivity:
I understand when I use the
driver=none
in minikube, it makes use of the host system. Is the issue in DNS resolution due to the host machine? I am not sure. But the internet works fine on my host machine.When I remove the
--driver=option
and do$ minikube start
and follow the above steps, the pods connects to internet just fine.Attach the log file
log.txt
Please consider the last start...
Operating System
Ubuntu
Driver
None
The text was updated successfully, but these errors were encountered: