Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM is not initializing on ARM64 #956

Open
jaredcash opened this issue Jul 3, 2024 · 55 comments
Open

VM is not initializing on ARM64 #956

jaredcash opened this issue Jul 3, 2024 · 55 comments
Labels
kind/bug lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@jaredcash
Copy link

What happened:
I am unable to access a newly deployed VM. The output of kubectl get vmi shows that the VM is running and ready but I believe it is not fully initializing as I am unable to access the VM via virtctl console / virtctl ssh and there are no guest console logs from the virt-launcher pod. As a note, I deployed the Kubernetes cluster using K0s.
All nodes in the cluster are passing qemu validation:

node3:~$ virt-host-validate qemu
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (Unknown if this platform has IOMMU support)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)

Kubevirt components:

$ kubectl get all -n kubevirt
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME                                   READY   STATUS    RESTARTS   AGE
pod/virt-api-64d75d4f5-66vxg           1/1     Running   0          22h
pod/virt-api-64d75d4f5-rl6cn           1/1     Running   0          22h
pod/virt-controller-64d65c6684-ggwlc   1/1     Running   0          22h
pod/virt-controller-64d65c6684-xqx7m   1/1     Running   0          22h
pod/virt-handler-82vdv                 1/1     Running   0          22h
pod/virt-handler-fsvz8                 1/1     Running   0          22h
pod/virt-handler-l664w                 1/1     Running   0          22h
pod/virt-operator-6c89df8955-jrjf9     1/1     Running   0          22h
pod/virt-operator-6c89df8955-r9wkj     1/1     Running   0          22h

NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/kubevirt-operator-webhook     ClusterIP   10.101.225.75   <none>        443/TCP   22h
service/kubevirt-prometheus-metrics   ClusterIP   None            <none>        443/TCP   22h
service/virt-api                      ClusterIP   10.96.236.192   <none>        443/TCP   22h
service/virt-exportproxy              ClusterIP   10.110.33.182   <none>        443/TCP   22h

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/virt-handler   3         3         3       3            3           kubernetes.io/os=linux   22h

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/virt-api          2/2     2            2           22h
deployment.apps/virt-controller   2/2     2            2           22h
deployment.apps/virt-operator     2/2     2            2           22h

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/virt-api-64d75d4f5           2         2         2       22h
replicaset.apps/virt-controller-64d65c6684   2         2         2       22h
replicaset.apps/virt-operator-6c89df8955     2         2         2       22h

NAME                            AGE   PHASE
kubevirt.kubevirt.io/kubevirt   22h   Deployed
############################
$ kubectl get pod,vm,vmi
NAME                             READY   STATUS    RESTARTS   AGE
pod/virt-launcher-testvm-fhrc2   3/3     Running   0          11m

NAME                                AGE   STATUS    READY
virtualmachine.kubevirt.io/testvm   11m   Running   True

NAME                                        AGE   PHASE     IP             NODENAME   READY
virtualmachineinstance.kubevirt.io/testvm   11m   Running   10.244.135.7   node3      True

What you expected to happen:
Deploy a working VM using kubevirt.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy a K0s kubernetes cluster using k0sctl (https://docs.k0sproject.io/v1.30.0+k0s.0/k0sctl-install/) on a Turing RK1 compute module. Note, I am using Calico with vxlan as my CNI but this did fail with the same issue using kube-router (default CNI with K0s)
  2. Install Kubervirt
  3. Deploy a test VM following https://kubevirt.io/labs/kubernetes/lab1

Additional context:
My server is using ARM64 architecture and the hardware is Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image. I have attempted to use an aarch64 image for my kubevirt VM but that also failed to initialize (I used image quay.io/kubevirt/cirros-container-disk-demo:v1.2.2-arm64).

I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).

I have attached the logs from the virt-launcher pod (all containers) and my kubevirt CR object.

Environment:

  • KubeVirt version (use virtctl version): v1.2.1
  • Kubernetes version (use kubectl version): v1.30.0+k0s
  • VM or VMI specifications:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: testvm
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/size: small
        kubevirt.io/domain: testvm
    spec:
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: default
            masquerade: {}
        resources:
          requests:
            memory: 64M
      networks:
      - name: default
        pod: {}
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/cirros-container-disk-demo
        - name: cloudinitdisk
          cloudInitNoCloud:
            userDataBase64: SGkuXG4=
  • Cloud provider or hardware configuration: Hardware is baremetal Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). The cluster is 4 nodes (1 controller and 3 workers) but I had this issue using one RK1 node as a single node cluster.
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.3
  • Kernel (e.g. uname -a): 5.10.160-rockchip aarch64 GNU/Linux
  • Install tools: K0s with Calico vxlan was used to deploy the Kubernetes cluster

kubevirt-cr-yaml.txt
virt-launcher-logs.txt

@aburdenthehand
Copy link
Member

/cc @zhlhahaha Is this something you are able to help with?

@zhlhahaha
Copy link

    resources:
      requests:
        memory: 64M

Hi @jaredcash
Can you try to increase the memory from 64M to 256M?

@jaredcash
Copy link
Author

@zhlhahaha unfortunately, increasing the memory did not help. It still appears that the VM will not initialize as the guest-console-log container logs are blank and I am unable to access it:

        resources:
          requests:
            memory: 256M

guest-console-log:

$ kubectl logs virt-launcher-testvm2-6rkq6 -c guest-console-log | wc -l
0

Note: I also tried creating the VM with 1G of memory and got the same results.

I attached the describe of both the virt-launcher pod and the new VM object with 256M in case that is helpful.
Please let me know if any additional items are needed.

vm-obj-describe.txt
pod_virt-launcher-obj-describe.txt

@zhlhahaha
Copy link

It is interesting that there are no failure logs in virt-launcher.log or the console log. This usually indicates that the VM may be encountering a boot failure during the bootloader stage. This could be caused by incorrect UEFI firmware, a corrupted VM disk, or a mismatch in the CPU architecture of the VM disk. I will investigate this further in my local environment.

@zhlhahaha
Copy link

zhlhahaha commented Jul 29, 2024

The quay.io/kubevirt/cirros-container-disk-demo:latest is only for x86_64.
Can you use this image and make sure the allocated memory is equal or larger than 256M?

quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64

I can successfully boot the VM based on the image in my local env.

@zhlhahaha
Copy link

@aburdenthehand If @jaredcash verified the image issue, we can update the document https://kubevirt.io/labs/kubernetes/lab1.

@aburdenthehand
Copy link
Member

I don't see any of the labs specifying the cirros disk image. Am I missing something?
That said, I would be happy to have any callouts or alternative steps to support additional architectures in the labs, and really any and all improvement to the labs. Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.

@jaredcash
Copy link
Author

jaredcash commented Jul 29, 2024

The quay.io/kubevirt/cirros-container-disk-demo:latest is only for x86_64. Can you use this image and make sure the allocated memory is equal or larger than 256M?

quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64

I can successfully boot the VM based on the image in my local env.

@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:

$ kubectl get vm
NAME      AGE   STATUS    READY
testvm1   43m   Running   True

$ kubectl get vm testvm1 -o custom-columns=MEMORY:.spec.template.spec.domain.resources.requests.memory,IMAGE:.spec.template.spec.volumes[0].containerDisk.image
MEMORY   IMAGE
1G       quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64

$ kubectl logs virt-launcher-testvm1-vwwr8 -c guest-console-log | wc -l
0

I'm unsure if this is somehow related to my hardware, even though the virt-host-validate is seemingly passing for each node and I am able to deploy a VM using virt-install on the worker nodes.

@zhlhahaha
Copy link

I don't see any of the labs specifying the cirros disk image. Am I missing something?

It do not specific cirros disk image in the doc. However, the command in the vm configuration file vm.yaml contains the disk image information, and in which it would use x86 only cirros image.

wget https://kubevirt.io/labs/manifests/vm.yaml

Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.

Ok, I will rise an issue after I solve @jaredcash 's problem.

@zhlhahaha
Copy link

@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:

Would you mind to collect the following information?

  1. show if the qemu process is running ps aux|grep qemu
  2. Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log
$ kubectl edit kubevirt -n kubevirt
apiVersion: kubevirt.io/v1
kind: KubeVirt
...
spec:
...
  configuration:
    developerConfiguration:
      logVerbosity:
        virtLauncher: 8
...
status:
  1. Are you using virtctl console testvm to visit the virtual machine? Can you access the vm console or you get an error message when run this command?

@jaredcash
Copy link
Author

hello @zhlhahaha, here is the requested information:

show if the qemu process is running ps aux|grep qemu

Please view the following of the qemu processes on the worker node:

root@node3:~# ps aux | grep qemu
uuidd    1045139  0.0  0.1 1686836 11756 ?       Ssl  20:40   0:00 /usr/bin/virt-launcher-monitor --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd    1045157  0.1  0.6 2561068 52644 ?       Sl   20:40   0:01 /usr/bin/virt-launcher --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd    1045172  0.0  0.2 1290448 21328 ?       Sl   20:40   0:00 /usr/sbin/virtqemud -f /var/run/libvirt/virtqemud.conf
uuidd    1045391  100  2.1 1718748 170312 ?      Sl   20:40  23:57 /usr/libexec/qemu-kvm -name guest=default_testvm1,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/run/kubevirt-private/libvirt/qemu/lib/domain-1-default_testvm1/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/AAVMF/AAVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-private/libvirt/qemu/nvram/testvm1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine virt-rhel9.2.0,usb=off,gic-version=3,dump-guest-core=off,memory-backend=mach-virt.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,acpi=on -accel kvm -cpu host -m size=976896k -object {"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1000341504} -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object {"qom-type":"iothread","id":"iothread1"} -uuid cfb867c9-fa3a-51f5-b0f5-485fd556fd68 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=20,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"} -device {"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"} -device {"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"} -device {"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"} -device {"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"} -device {"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"} -device {"driver":"pcie-root-port","port":14,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x1.0x6"} -device {"driver":"pcie-root-port","port":15,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x1.0x7"} -device {"driver":"pcie-root-port","port":16,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"qemu-xhci","id":"usb","bus":"pci.5","addr":"0x0"} -device {"driver":"virtio-scsi-pci-non-transitional","id":"scsi0","bus":"pci.6","addr":"0x0"} -device {"driver":"virtio-serial-pci-non-transitional","id":"virtio-serial0","bus":"pci.7","addr":"0x0"} -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.8","addr":"0x0","drive":"libvirt-2-format","id":"ua-containerdisk","bootindex":1,"write-cache":"on","werror":"stop","rerror":"stop"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/testvm1/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.9","addr":"0x0","drive":"libvirt-1-format","id":"ua-cloudinitdisk","write-cache":"on","werror":"stop","rerror":"stop"} -netdev {"type":"tap","fd":"21","vhost":true,"vhostfd":"23","id":"hostua-default"} -device {"driver":"virtio-net-pci-non-transitional","host_mtu":1450,"netdev":"hostua-default","id":"ua-default","mac":"6e:7e:49:88:36:2d","bus":"pci.1","addr":"0x0","romfile":""} -add-fd set=0,fd=19,opaque=serial0-log -chardev socket,id=charserial0,fd=17,server=on,wait=off,logfile=/dev/fdset/0,logappend=on -serial chardev:charserial0 -chardev socket,id=charchannel0,fd=18,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -device {"driver":"usb-kbd","id":"input1","bus":"usb.0","port":"2"} -audiodev {"id":"audio1","driver":"none"} -vnc vnc=unix:/var/run/kubevirt-private/72b0120f-b4ad-4a1b-a612-3b37466eeebc/virt-vnc,audiodev=audio1 -device {"driver":"virtio-gpu-pci","id":"video0","max_outputs":1,"bus":"pci.2","addr":"0x0"} -device {"driver":"virtio-balloon-pci-non-transitional","id":"balloon0","free-page-reporting":true,"bus":"pci.10","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
root     1054126  0.0  0.0   6020  1984 pts/1    S+   21:04   0:00 grep --color=auto qemu

Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log

I have attached all the container logs of the virt-launcher pod after adding more verbose logging.

Are you using virtctl console testvm to visit the virtual machine? Can you access the vm console or you get an error message when run this command?

I do have virtctl installed on the manager and I have attempted to console into the VM. I do not get an error message but I just get a blank display. Even if I hit any key, it will not progress any further until I ctrl + ] to escape, for reference:

$ virtctl console testvm1
Successfully connected to testvm1 console. The escape sequence is ^]
                                                                    
$

Please let me know if anything additional is needed.

virt-launcher-all-containers.log

@zhlhahaha
Copy link

There is no error from the virt-launcher.log and qemu process can start successfully. Let's try fedora image, would you mind to use following config to start fedora image?

---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-fedora
  name: vmi-fedora
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      interfaces:
      - masquerade: {}
        name: default
      rng: {}
    resources:
      requests:
        memory: 1024M
  networks:
  - name: default
    pod: {}
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: quay.io/containerdisks/fedora:40
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk

@jaredcash
Copy link
Author

With the fedora image, the VM is still not initializing. Even though vmi is showing the VM as running, virtctl console is still blank. I am personally not noticing outliers in the logs but I have attached them for further inspection.
Outputs for reference:

$ kubectl get pod,vmi
NAME                                 READY   STATUS    RESTARTS   AGE
pod/virt-launcher-testvm1-xxf76      3/3     Running   0          24h
pod/virt-launcher-vmi-fedora-rjtdr   3/3     Running   0          20m

NAME                                            AGE   PHASE     IP              NODENAME   READY
virtualmachineinstance.kubevirt.io/testvm1      24h   Running   10.244.135.15   node3      True
virtualmachineinstance.kubevirt.io/vmi-fedora   20m   Running   10.244.104.13   node2      True

As a note, I also tested the fedora VM with 2048M and got the same results. The logs I provided are from the 1024M VM.

fedora-virt-launcher.log

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2024
@jaredcash
Copy link
Author

Unfortunately, this issue persists. @zhlhahaha and/or Kubevirt team, have you had some time to review my previous message?
Please let me know if anything is needed to continue troubleshooting this issue.

@zhlhahaha
Copy link

I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).

Sorry, I missed your message. I suspect the UEFI boot failed. Would you mind to provide more information?

  1. CPU information and Memory infromation, sudo dmidecode -t processor && free -h, as you have three node, check if all nodes have the same CPU.
  2. the configuration of cirros vm started via virsh dumpxml vmname

@jaredcash
Copy link
Author

@zhlhahaha it seems there is an issue with dmidecode on aarch64 systems as I am getting the following error on my baremetal servers:

$ sudo dmidecode -t processor
# dmidecode 3.3
# No SMBIOS nor DMI entry point found, sorry.

I gather the CPU information for all my nodes via the lscpu command. Please let me know if the information is fine or if another command is suggested (e.g. lshw).

Additionally, I deployed a new Cirros VM using the image suggested here (#956 (comment)) and gather the dumpxml of this VM.
Note, that the VM is experiencing our issue, failing to initialize.
nodes-cpu-mem-info.txt
cirros-dumpxml.txt

@zhlhahaha
Copy link

Hi @jaredcash
The cpu info is ok, I saw it use A72 and A55 Arm cpu, I need to check their spec. I used to start kubevirt on Raspberry Pi 4 which has Cortex-A72 CPU.
In terms of the cirros vm configuration, I means the successfully boot cirros vm configuration via pure virsh, as you said:

 I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image. 

@jaredcash
Copy link
Author

Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh.
virsh-cirros-dumpxml.txt

@zhlhahaha
Copy link

Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh. virsh-cirros-dumpxml.txt

Thanks! I didn’t notice any differences between the successfully booted Cirros VM and the KubeVirt one. Would you mind double-checking if the successfully booted Cirros VM is starting on the server with the Cortex-A55 CPU? Initially, I suspected a difference in the Generic Interrupt Controller (GIC) versions between the Cortex-A55 and Cortex-A72 CPUs, but they appear to use the same GIC version. Now, it seems the UEFI firmware may be the only possible cause. Would you be able to replace /usr/share/AAVMF/AAVMF_CODE.fd in the virt-launcher with the one from the host?

@jaredcash
Copy link
Author

@zhlhahaha I redeployed the cirros test VM I did with kubevirt to the same node (node4) to ensure it is using the same CPU (Cortex-A55).
Regarding replacing AAVMF_CODE.fd , it seems that the virt-launcher pod does not allow edits of this file as sudo is not available:

$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- ls -l /usr/share/AAVMF/AAVMF_CODE.fd
lrwxrwxrwx 1 root root 42 Jan  1  1970 /usr/share/AAVMF/AAVMF_CODE.fd -> ../edk2/aarch64/QEMU_EFI-silent-pflash.raw
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- mv /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/AAVMF/AAVMF_CODE.fd.bak
mv: cannot move '/usr/share/AAVMF/AAVMF_CODE.fd' to '/usr/share/AAVMF/AAVMF_CODE.fd.bak': Permission denied
command terminated with exit code 1
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- sudo rm -f /usr/share/AAVMF/AAVMF_CODE.fd
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8189abc40bee955c058cc308065c5bbff4ba28ec0438f0c08fe369f6cc0aebb3": OCI runtime exec failed: exec failed: unable to start container process: exec: "sudo": executable file not found in $PATH: unknown

Note, I did attempt to become the root user but it is asking for a password:

$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- /bin/bash
bash-5.1$ su - root
Password:
su: Authentication failure
bash-5.1$

As a workaround, I used nsenter from the host node to copy the file from the host node to the virt-launcher pod which worked:

[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 qemu qemu 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.ra

After giving it some time, the VM was still not initializing. In an attempt to get it to work, I changed the ownership of AAVMF_CODE.fd to root:root

[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw

Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work.
I have re-copied the AAVMF_CODE.fd from the host node to the pod after the pod restart, so the current state is the following:

$ kubectl exec -it virt-launcher-testvm1-4hvwn -- ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw

I have also attached fresh logs of the virt-launcher pod for reference.
Please let me know if there are other steps I need to perform after replacing AAVMF_CODE.fd

virt-launcher-testvm1.log

@zhlhahaha
Copy link

Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work.

The AAVMF_CODE file is the UEFI boot firmware used during VM startup. After replacing this file, a VM reboot is necessary for the changes to take effect. Additionally, if you restart the pod, it will revert to the original virt-launcher image where the AAVMF_CODE file hasn't been replaced.

To make this change effective, you’ll need to replace the AAVMF_CODE.fd file in the virt-launcher image itself rather than in individual pods, then use this updated virt-launcher image to start the VM.

@andreabolognani Do you have any suggestion?

@andreabolognani
Copy link

There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook. I'm not too familiar with these facilities, so I might be wrong about it. Rebuilding the virt-launcher image is obviously always going to be possible, but the process would be quite involved so I'd really leave it as a last ditch effort.

My suggestion would be to try and figure out a way to change

<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>

in the domain XML to

<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.verbose.fd</loader>

The verbose build of AAVMF would hopefully produce at least some output pointing us in the right direction. Again, I'm not sure what facilities, if any, KubeVirt provides to inject this kind of change. Sidecar hook might be the one.

@zhlhahaha
Copy link

There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook.

Yes, sidecar is a good suggestion! It can run custom script before VM initialization. Here is an guidance, https://kubevirt.io/user-guide/user_workloads/hook-sidecar/

@andreabolognani
Copy link

Based on this example, something like

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config-map
data:
  my_script.sh: |
    #!/bin/sh
    tempFile=`mktemp --dry-run`
    echo $4 > $tempFile
    sed -i "s|AAVMF_CODE.fd|AAVMF_CODE.verbose.fd|" $tempFile
    cat $tempFile

(completely untested) should do the trick.

@jaredcash
Copy link
Author

jaredcash commented Nov 7, 2024

Hello @zhlhahaha @andreabolognani, following the suggestions above, I was to create a VM with the AAVMF_CODE.verbose.fd UEFI boot firmware. I followed the example here https://github.com/kubevirt/kubevirt/blob/main/examples/vmi-with-sidecar-hook-configmap.yaml but I did change the Fedora image to the one previously mentioned here #956 (comment)

I have attached all container logs of the virt-launcher and the dumpxml of my VM.

I could not use the sidecar hook functionality to get the host's local UEFI boot firmware to the VM (if possible) as a test. I am still troubleshooting (of course I will welcome any suggestions if we want to go down that route) but I wanted to provide you both with the current data in the meantime.

fedora-sidecar-vm.log
fedora-dumpxml.txt

@andreabolognani
Copy link

@jaredcash the XML configuration looks good, it's clearly pointing at the verbose AAVMF build now.

I don't see any guest output in the log, though admittedly I'm not entirely sure it's supposed to be there in the first place. Do you still get absolutely zero output on the VM's serial console?

@jaredcash
Copy link
Author

@andreabolognani yes, unfortunately, I am still getting zero output from the VM's serial console. For reference:

$ virtctl console vmi-with-sidecar-hook-configmap
Successfully connected to vmi-with-sidecar-hook-configmap console. The escape sequence is ^]

$

@andreabolognani
Copy link

I assume you're making sure to connect to the console the moment it is available, so no output is lost because of a delay.

Well, I'm truly out of ideas at this point. The VM configuration looks good, and even if the guest image was completely busted you should still get some output out of the verbose AAVMF build.

Since the pod at least remains up, maybe you can play inside it to try and get a better understanding. Maybe run virt-host-validate there, then try something like

$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio

That should produce a lot of output.

@jaredcash
Copy link
Author

@andreabolognani from within the pod, the virt-host-validate is passing, for reference:

$ kubectl exec -it virt-launcher-vmi-with-sidecar-hook-configmap-8hnl2 -- /bin/bash
bash-5.1$
bash-5.1$ virt-host-validate qemu
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI IORT table found, IOMMU not supported by this hardware platform)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
bash-5.1$

Interestingly, I am getting no output from the qemu-kvm command. I left the command for an hour and still no output until I killed the command, for reference:

bash-5.1$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1$

I was playing around with the command but I am not seeing an option for a more verbose output.

@andreabolognani
Copy link

Looks like the issue is probably at the QEMU level. Hopefully @zhlhahaha can help you debug this further, because I'm entirely out of my depth at this point :)

@zhlhahaha
Copy link

zhlhahaha commented Nov 11, 2024

Looks like the issue is probably at the QEMU level. Hopefully @zhlhahaha can help you debug this further, because I'm entirely out of my depth at this point :)

Thanks @andreabolognani , currently I have no idea why it happens. Maybe we can make it simpler, @jaredcash would you mind to try

docker run -it --rm --privileged --entrypoint=/bin/bash quay.io/kubevirt/virt-launcher:v1.3.0
# In the container
/usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio

See if there is any output. It works fine on my local Arm64 server and I can see many outputs:

...
pdateRegionMappingRecursive(3): 47E00000 - 48000000 set 70C clr 0
UpdateRegionMappingRecursive(0): 8000000 - 10000000 set 60000000000400 clr 0
UpdateRegionMappingRecursive(1): 8000000 - 10000000 set 60000000000400 clr 0
UpdateRegionMappingRecursive(2): 8000000 - 10000000 set 60000000000400 clr 0
UpdateRegionMappingRecursive(0): 1000 - 200000 set 78C clr 0
UpdateRegionMappingRecursive(1): 1000 - 200000 set 78C clr 0
UpdateRegionMappingRecursive(2): 1000 - 200000 set 78C clr 0
UpdateRegionMappingRecursive(3): 1000 - 200000 set 78C clr 0
Temp Stack : BaseAddress=0x4007E010 Length=0x1FF0
Temp Heap  : BaseAddress=0x4007C020 Length=0x1FF0
Total temporary memory:    16352 bytes.
  temporary memory stack ever used:       3984 bytes.
  temporary memory heap used for HobList: 3800 bytes.
...

@zhlhahaha
Copy link

I also try to use the original AAVMF_CODE.fd, and get following output

# docker run -it --rm --privileged --entrypoint=/bin/bash quay.io/kubevirt/virt-launcher:v1.3.0
bash-5.1# /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
UEFI firmware (version edk2-20231122-6.el9 built at 00:00:00 on Feb 22 2024)
Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found

>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.

If there is still no output, you can try to replace the AAVMF_CODE.fd with the host one.

@jaredcash
Copy link
Author

@zhlhahaha I am not getting an output using both AAVMF_CODE.verbose.fd or AAVMF_CODE.fd, for reference:

$ docker run -it --rm --privileged --entrypoint=/bin/bash quay.io/kubevirt/virt-launcher:v1.3.0
Unable to find image 'quay.io/kubevirt/virt-launcher:v1.3.0' locally
v1.3.0: Pulling from kubevirt/virt-launcher
83be2bc98eb9: Pull complete
b2d2d903ddd0: Pull complete
Digest: sha256:25a0332c3873bd59ae6adc80dfde314101a469910b9a60499b80f48f5d8bce02
Status: Downloaded newer image for quay.io/kubevirt/virt-launcher:v1.3.0
bash-5.1# /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1#
bash-5.1# /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1#

I will try to replace the AAVMF_CODE.fd file with the host one. Do you have a way this could be done via a sidecar hook? I've been attempting a few ways but have been unsuccessful.

@zhlhahaha
Copy link

I will try to replace the AAVMF_CODE.fd file with the host one. Do you have a way this could be done via a sidecar hook? I've been attempting a few ways but have been unsuccessful.

You can do it easily in this docker environment

# keep the virt-launcher:v1.3.0 container running, and open another terminal
# use docker ps to get the container ID
docker cp /your_host_path/AAVMF_CODE.fd container_id:/usr/share/AAVMF/AAVMF_CODE.fd

Then see if there is any output.

@zhlhahaha
Copy link

zhlhahaha commented Nov 14, 2024

BTW, @jaredcash ,

  1. would you mind to share the qemu command line of the successfully booted cirros VM via pure virsh on the host?
    You can get the command by using ps aux|grep qemu after the VM start.
  2. Can you try to run the following command on your host to see if there is any output?
qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio

@jaredcash
Copy link
Author

@zhlhahaha apologies, that was my misunderstanding. I copied the host's local UEFI boot firmware to the docker container and I am getting the same output from the container as I do with running the qemu-system-aarch64 command on the host. It is taking me to the UEFI shell CLI:

$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
BdsDxe: failed to load Boot0001 "UEFI Misc Device" from VenHw(93E34C7E-B50E-11DF-9223-2443DFD72085,00): Not Found

>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0002 "UEFI PXEv4 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0003 "UEFI PXEv6 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.....
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0004 "UEFI HTTPv4 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Unexpected network error.
BdsDxe: failed to load Boot0005 "UEFI HTTPv6 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found
BdsDxe: loading Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
     BLK0: Alias(s):
          VenHw(93E34C7E-B50E-11DF-9223-2443DFD72085,00)
Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>

Also, here is the command used to start the VM using pure Virsh:

/usr/bin/qemu-system-aarch64 -name guest=cirros2,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-cirros2/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/AAVMF/AAVMF_CODE.ms.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/cirros_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine virt-6.2,usb=off,dump-guest-core=off,gic-version=3,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=mach-virt.ram -accel kvm -cpu host -m 512 -object {"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":536870912} -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 4fa8b35a-1d8b-466f-bd5d-17cfe5cdc3ad -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 -device pcie-root-port,port=9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 -device pcie-root-port,port=10,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device pcie-root-port,port=11,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device pcie-root-port,port=12,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 -device pcie-root-port,port=13,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 -device pcie-root-port,port=14,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 -device pcie-root-port,port=15,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x7 -device pcie-root-port,port=16,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=10,id=pci.10,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=11,id=pci.11,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=12,id=pci.12,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=13,id=pci.13,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=14,id=pci.14,bus=pcie.0,addr=0x2.0x5 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/tmp/cirros-0.5.2-aarch64-disk.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:61:fc:fe,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -serial chardev:charserial0 -chardev socket,id=charchannel0,fd=31,server=on,wait=off -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/2-cirros2-swtpm.sock -tpmdev emulator,id=tpm-tpm0,chardev=chrtpm -device tpm-tis-device,tpmdev=tpm-tpm0,id=tpm0 -audiodev {"id":"audio1","driver":"none"} -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

@zhlhahaha
Copy link

@jaredcash You can get the output when using the host AAVMF_CODE.fd, so the issue may caused by the UEFI firmware problem.
Next step, you can try to use the docker commit to create a new virt-launcher image and push it to your docker hub repository. Then try to use the new virt-launcher image in the kubeVIrt environment to see if it works well.

# 1. create the new virt-launcher image
# use docker ps to get the virt-launcher:v1.30 container ID, make sure the AAVMF_CODE.fd in the container has replaced with the host one.
# then use docker commit to generate a new virt-launcher:v1.30
# e.g.
docker commit contianer_id your_image_repository/virt-launcher:myversion
docker push your_image_repository/virt-launcher:myversion

# 2. use the new virt-launcher to start the vm
# I am not sure if it can be modified in the kubevirt config, I need to check

@zhlhahaha
Copy link

zhlhahaha commented Nov 15, 2024

I found only one place where an overall image registry can be configured: kubevirt/types.go#L1980.

However, it does not support setting the repository specifically for the virt-launcher image.

To use a new virt-launcher image, the process is somewhat tricky:

  1. Push all required images to your private repository with the same tag (e.g., Docker Hub).
  2. Edit the kubevirt-operator.yaml file to set the repository and tag of the virt-operator to your private repository. Note that other images will automatically use the same registry as the one from which the operator's container image is pulled.
  3. Push the new virt-launcher image to the same repository.
  4. Start the cluster.

@jaredcash
Copy link
Author

jaredcash commented Nov 19, 2024

@zhlhahaha, it looks like UEFI firmware was the issue. After applying this workaround, the VM was accessible. As a note, my cluster is using Kubevirt v1.2.1, so those were the images I pushed to my repo.

//from virt-launcher pod:
$ kubectl exec virt-launcher-testvm-zqmzg -- cksum /usr/share/AAVMF/AAVMF_CODE.fd
2724878797 67108864 /usr/share/AAVMF/AAVMF_CODE.fd
//from node4
node4:~$ cksum /usr/share/AAVMF/AAVMF_CODE.fd
2724878797 67108864 /usr/share/AAVMF/AAVMF_CODE.fd

Console/SSH verification:

//console test
$ virtctl console testvm
Successfully connected to testvm console. The escape sequence is ^]

login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root.
testvm login: cirros
Password:
$
$ hostname
testvm
//ssh test
$ virtctl ssh cirros@testvm
cirros@vmi/testvm.default's password:
$
$ hostname
testvm
$

@andreabolognani
Copy link

KubeVirt 1.2.1 comes with edk2 2023.05 while the recently released 1.4.0 comes with edk2 2024.05.

It would be interesting to know whether the latest release works out of the box on your machine, which means we can chalk it down to some edk2 issue that's been addressed in the meantime, or not.

Do I understand correctly that you've copied the host's edk build into the container? That's very interesting. Ubuntu 22.04 comes with a version of edk2 that's even older (2022.02) so it's somewhat surprising to me that it apparently works better.

@zhlhahaha
Copy link

Base on @andreabolognani 's suggestion, you can give it a quick check by running:

# docker run -it --rm --privileged --entrypoint=/bin/bash quay.io/kubevirt/virt-launcher:v1.4.0
bash-5.1# /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio

To see if there are any output.

@jaredcash
Copy link
Author

hey @andreabolognani

Do I understand correctly that you've copied the host's edk build into the container?

Yep, that is correct (specifically with the virt-launcher container).

@zhlhahaha I tried your suggestion of running a container with virt-launcher v1.4.0 but unfortunately, I ran into the same situation of getting no output:

node4:~$ sudo docker run -it --rm --privileged --entrypoint=/bin/bash quay.io/kubevirt/virt-launcher:v1.4.0
bash-5.1#
bash-5.1# /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1#

@andreabolognani
Copy link

@jaredcash that's very interesting!

The fact that a firmware image consistently either works or doesn't work across two very different QEMU builds (Ubuntu 22.04 vs recent Fedora releases) certainly seems to point in the direction of edk2 being the root cause.

I'd like to narrow things down further. This will be a bit of work for you, hopefully you don't mind too much. You've been extremely cooperative so far, which I appreciate a lot :)

The idea is to try an old Fedora build, roughly matching the upstream version of the one in Ubuntu 22.04. If that works, then the issue was likely introduced upstream; if it doesn't, it might be downstream-specific.

I think it's fine to run the tests using the host's QEMU, since as we've seen from earlier tries that doesn't seem to be the determining factor.

So please download edk2-aarch64-20220221gitb24306f15daa-4.fc36.noarch.rpm and unpack it using

$ rpm2cpio edk2-aarch64-20220221gitb24306f15daa-4.fc36.noarch.rpm | cpio -idD ./fedora-202202/

You'll need to install the rpm2cpio and cpio packages on Ubuntu, luckily they're not too big.

This is what the contents of the newly-created directory should look like:

$ find ./fedora-202202/ -ls
 25959909      4 drwxr-xr-x   3 root     root         4096 Nov 21 10:19 ./fedora-202202/
 25959910      4 drwxr-xr-x   3 root     root         4096 Nov 21 10:19 ./fedora-202202/usr
 25959911      4 drwxr-xr-x   6 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share
 25959930      4 drwxr-xr-x   3 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/qemu
 25959931      4 drwxr-xr-x   2 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/qemu/firmware
 25959933      4 -rw-r--r--   1 root     root          674 Nov 21 10:19 ./fedora-202202/usr/share/qemu/firmware/70-edk2-aarch64-verbose.json
 25959932      4 -rw-r--r--   1 root     root          643 Nov 21 10:19 ./fedora-202202/usr/share/qemu/firmware/60-edk2-aarch64.json
 25959912      4 drwxr-xr-x   2 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/AAVMF
 25959914      0 lrwxrwxrwx   1 root     root           35 Nov 21 10:19 ./fedora-202202/usr/share/AAVMF/AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
 25959913      0 lrwxrwxrwx   1 root     root           42 Nov 21 10:19 ./fedora-202202/usr/share/AAVMF/AAVMF_CODE.fd -> ../edk2/aarch64/QEMU_EFI-silent-pflash.raw
 25959915      0 lrwxrwxrwx   1 root     root           40 Nov 21 10:19 ./fedora-202202/usr/share/AAVMF/AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw
 25959924      4 drwxr-xr-x   3 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/licenses
 25959925      4 drwxr-xr-x   2 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/licenses/edk2-aarch64
 25959928      4 -rw-r--r--   1 root     root         4080 Nov 21 10:19 ./fedora-202202/usr/share/licenses/edk2-aarch64/License.OvmfPkg.txt
 25959929      4 -rw-r--r--   1 root     root         2732 Nov 21 10:19 ./fedora-202202/usr/share/licenses/edk2-aarch64/License.txt
 25959927     32 -rw-r--r--   1 root     root        28674 Nov 21 10:19 ./fedora-202202/usr/share/licenses/edk2-aarch64/License-History.txt
 25959926      8 -rw-r--r--   1 root     root         6121 Nov 21 10:19 ./fedora-202202/usr/share/licenses/edk2-aarch64/LICENSE.openssl
 25959916      4 drwxr-xr-x   3 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/edk2
 25959917      4 drwxr-xr-x   2 root     root         4096 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64
 25959919  65540 -rw-r--r--   1 root     root     67108864 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw
 25959923  65540 -rw-r--r--   1 root     root     67108864 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/vars-template-pflash.raw
 25959918  65540 -rw-r--r--   1 root     root     67108864 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw
 25959920   2048 -rw-r--r--   1 root     root      2097152 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/QEMU_EFI.fd
 25959921   2048 -rw-r--r--   1 root     root      2097152 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/QEMU_EFI.silent.fd
 25959922    768 -rw-r--r--   1 root     root       786432 Nov 21 10:19 ./fedora-202202/usr/share/edk2/aarch64/QEMU_VARS.fd

If everything looks good run, run the test again, this time pointing to the firmware image you've just extracted from the package:

$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./fedora-202202/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio

Similarly, it would interesting to test the latest Ubuntu build. The process is similar: download qemu-efi-aarch64_2024.05-2ubuntu0.1_all.deb and unpack it with

$ dpkg-deb -x qemu-efi-aarch64_2024.05-2ubuntu0.1_all.deb ./ubuntu-202405/

This is what the contents should look like this time around:

$ find ./ubuntu-202405/ -ls
 25959934      4 drwxr-xr-x   3 root     root         4096 Oct  6 18:39 ./ubuntu-202405/
 25959935      4 drwxr-xr-x   3 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr
 25959936      4 drwxr-xr-x   6 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share
 25959948      4 drwxr-xr-x   3 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu
 25959949      4 drwxr-xr-x   2 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu/firmware
 25959952      4 -rw-r--r--   1 root     root          625 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu/firmware/60-edk2-aarch64.json
 25959950      4 -rw-r--r--   1 root     root          717 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu/firmware/40-edk2-aarch64-secure-enrolled.json
 25959951      4 -rw-r--r--   1 root     root          691 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu/firmware/50-edk2-aarch64-secure.json
 25959937      4 drwxr-xr-x   2 root     root         4096 Nov 21 10:19 ./ubuntu-202405/usr/share/AAVMF
 25959958      0 lrwxrwxrwx   1 root     root           21 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.ms.fd -> AAVMF_CODE.secboot.fd
 25959941  65540 -rw-r--r--   1 root     root     67108864 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_VARS.ms.fd
 25959942  65540 -rw-r--r--   1 root     root     67108864 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_VARS.snakeoil.fd
 25959957      0 lrwxrwxrwx   1 root     root           24 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.fd -> AAVMF_CODE.no-secboot.fd
 25959940  65540 -rw-r--r--   1 root     root     67108864 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_VARS.fd
 25959939  65540 -rw-r--r--   1 root     root     67108864 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.secboot.fd
 25959959      0 lrwxrwxrwx   1 root     root           21 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.snakeoil.fd -> AAVMF_CODE.secboot.fd
 25959938  65540 -rw-r--r--   1 root     root     67108864 Oct  6 18:39 ./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.no-secboot.fd
 25959953      4 drwxr-xr-x   2 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu-efi-aarch64
 25959955      4 -rw-r--r--   1 root     root         1391 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu-efi-aarch64/PkKek-1-snakeoil.pem
 25959956   2048 -rw-r--r--   1 root     root      2097152 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu-efi-aarch64/QEMU_EFI.fd
 25959954      4 -rw-r--r--   1 root     root         1854 Oct  6 18:39 ./ubuntu-202405/usr/share/qemu-efi-aarch64/PkKek-1-snakeoil.key
 25959943      4 drwxr-xr-x   3 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share/doc
 25959944      4 drwxr-xr-x   2 root     root         4096 Oct  6 18:39 ./ubuntu-202405/usr/share/doc/qemu-efi-aarch64
 25959947     20 -rw-r--r--   1 root     root        18202 Oct  6 18:39 ./ubuntu-202405/usr/share/doc/qemu-efi-aarch64/copyright
 25959945      4 -rw-r--r--   1 root     root         1905 Oct  6 18:39 ./ubuntu-202405/usr/share/doc/qemu-efi-aarch64/README.Debian
 25959946     12 -rw-r--r--   1 root     root         9660 Oct  6 18:39 ./ubuntu-202405/usr/share/doc/qemu-efi-aarch64/changelog.Debian.gz

Run the test one last time:

$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio

Phew! Thanks in advance :)

@jaredcash
Copy link
Author

@andreabolognani no problem! I also appreciate all the help both you and @zhlhahaha have provided!

Testing your theory, the Fedora build that roughly matches Ubuntu 22.04 worked:

# Fedora package:
$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./fedora-202202/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
UEFI firmware starting.
Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found
Image type X64 can't be loaded on AARCH64 UEFI system.
BdsDxe: failed to load Boot0001 "UEFI Misc Device" from VenHw(93E34C7E-B50E-11DF-9223-2443DFD72085,00): Not Found

>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0002 "UEFI PXEv4 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0003 "UEFI PXEv6 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.....
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0004 "UEFI HTTPv4 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Unexpected network error.
BdsDxe: failed to load Boot0005 "UEFI HTTPv6 (MAC:525400123456)" from PciRoot(0x0)/Pci(0x1,0x0)/MAC(525400123456,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found
BdsDxe: loading Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
     BLK0: Alias(s):
          VenHw(93E34C7E-B50E-11DF-9223-2443DFD72085,00)
Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>

Interestingly enough, the latest Ubuntu build did not work:

# Ubuntu package:
$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./ubuntu-202405/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
qemu-system-aarch64: terminating on signal 2

@andreabolognani
Copy link

@jaredcash thanks. It's increasingly looking like an upstream issue indeed!

I think the first thing to try at this point would be a more recent package, specifically

There's a small chance that the issue has already been found and fixed upstream. Maybe we're just lucky like that.

Assuming we're not, it would be extremely helpful to narrow down things further. These are all the builds made between edk2 2022.02, which based on the test above we know works, and edk2 2023.05, which is the version included in KubeVirt 1.21 and we know doesn't:

If you could take them all for a spin and report back the results, we'd then be able to pass the information on to the edk2 maintainer for further analysis.

@andreabolognani
Copy link

@jaredcash thanks a lot, that's very useful.

@kraxel can you please take a look at this?

The tl;dr is that the reporter is having trouble running VMs on their machine and we've tracked the problem down to edk2, as changing just that component while leaving everything else untouched makes all the difference between a successful run and an unsuccessful one.

We have further narrowed it down to an upstream issue rather than a downstream one since Fedora and Ubuntu builds of the same edk2 release present identical behavior.

Last working version: edk2 2022.08
First broken version: edk2 2022.11

Hardware: Turing RK1 compute module (aarch64)
Host OS: Ubuntu 22.04

Reproducer: /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/path/to/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
Failure mode: no output whatsoever is generated

Full details above, of course :)

@kraxel
Copy link

kraxel commented Dec 3, 2024

/usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/path/to/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio

Where is the variable store flash?

@andreabolognani
Copy link

Where is the variable store flash?

@kraxel we didn't set up one for the (simplified) reproducer. Do you think that could make a difference?

Note that the failure was originally reported against KubeVirt, which will take care of setting up both pflash devices, so the absence of the NVRAM part can't be the determining factor.

@kraxel
Copy link

kraxel commented Dec 3, 2024

Where is the variable store flash?

@kraxel we didn't set up one for the (simplified) reproducer. Do you think that could make a difference?

It's an invalid configuration and will most likely not boot up. It'll probably a different failure mode though, i.e. fail late enough that at least some messages show up on the serial line.

Any results with the latest fedora build?

One change in late 2022 was that the armvirt builds turn on paging very early. Which allows to properly setup memory attributes etc. But some buggy aarch64 cores tripped over that. So edk2 got the CAVIUM_ERRATUM_27456 config option (which enables an workaround). IIRC that landed in the first 2023 release, and the fedora builds have that turned on.

Possibly there are more errata dragons lurking though.

The "Turing RK1 compute module" is the only hardware affected it seems, is that correct?
Can't reproduce this on my raspberry pi 4, and the apple m2 works fine too.
So it is quite likely that this is somehow CPU related ...

@ardbiesheuvel ^^^

@ardbiesheuvel
Copy link

Where is the variable store flash?

@kraxel we didn't set up one for the (simplified) reproducer. Do you think that could make a difference?

It's an invalid configuration and will most likely not boot up. It'll probably a different failure mode though, i.e. fail late enough that at least some messages show up on the serial line.

Any results with the latest fedora build?

One change in late 2022 was that the armvirt builds turn on paging very early. Which allows to properly setup memory attributes etc. But some buggy aarch64 cores tripped over that. So edk2 got the CAVIUM_ERRATUM_27456 config option (which enables an workaround). IIRC that landed in the first 2023 release, and the fedora builds have that turned on.

Possibly there are more errata dragons lurking though.

Interestingly, that change did arrive between 22.08 and 22.11 so it might indeed be implicated here

The "Turing RK1 compute module" is the only hardware affected it seems, is that correct? Can't reproduce this on my raspberry pi 4, and the apple m2 works fine too. So it is quite likely that this is somehow CPU related ...

Which kernel version is the host using?

One thing that would be instructive is to check whether single stepping through the first 50 instructions or so is sufficient to get things running.

If you run qemu with -s -S, you can run gdb in a different terminal and run

target remote :1234

to connect, and then use si to step through the startup code.

@andreabolognani
Copy link

It's an invalid configuration and will most likely not boot up. It'll probably a different failure mode though, i.e. fail late enough that at least some messages show up on the serial line.

Yeah, that was the idea: working builds produce at least some output, while not working ones are completely silent. We could probably try again with NVRAM if we think it could realistically make a difference.

Any results with the latest fedora build?

The most recent that was tested so far is edk2-aarch64-20240813-2.fc41.noarch.rpm.

@jaredcash can you please try again with the very latest build, edk2-aarch64-20241117-5.fc41.noarch.rpm?

The "Turing RK1 compute module" is the only hardware affected it seems, is that correct?

It's the only one we know about, yes.

Interestingly, that change did arrive between 22.08 and 22.11 so it might indeed be implicated here

Very interesting indeed :)

@jaredcash
Copy link
Author

hey @andreabolognani the latest Fedora build is also not working, for reference:

$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./fedora-20241117/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
qemu-system-aarch64: terminating on signal 2
$

As an additional note, the following is the kernel version of all the physical nodes in the environment:

$ uname -a
Linux node4 5.10.160-rockchip #28 SMP Thu Dec 28 14:57:13 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

@ardbiesheuvel
Copy link

hey @andreabolognani the latest Fedora build is also not working, for reference:

$ qemu-system-aarch64 -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=./fedora-20241117/usr/share/AAVMF/AAVMF_CODE.fd,readonly=on -display none -serial stdio
qemu-system-aarch64: terminating on signal 2
$

As an additional note, the following is the kernel version of all the physical nodes in the environment:

$ uname -a
Linux node4 5.10.160-rockchip #28 SMP Thu Dec 28 14:57:13 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

This is a known KVM issue that was fixed in commit torvalds/linux@406504c. This fix was backported to v5.10.164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

7 participants