Guest Freeze #9

Soft · 2017-08-07T12:34:40Z

KVM patches can cause monitored VMs to freeze.

Steps to reproduce

Unfortunatelly, the steps to reproduce this problem are little convoluted since they relly on my multiple backends fork of nitro. I am sure a more minimal test case can be constructed but I've yet to do so. However, asuming the user has setup Nitro:

Start executing any of the linux tests: nose2 --log-level=debug -v test_linux.TestLinux.test_write
After Nitro has attached to the VM, prematurely terminate the test by pressing CTRL-C
Notice the extremely high CPU usage of the VM - the VM is now unresponsive
After a while, VM's kernel will report the situation.

Expected Behavior

After terminating the test, the VM should continue execution normally.

I haven't really looked into the possible causes of this, but it seems clear that the problem is in the kernel. I am not sure if the problem is present with Windows guests. This problem has been previosly discussed in the Multiple Backends issue

The text was updated successfully, but these errors were encountered:

aghamir · 2017-08-08T10:22:39Z

@Soft Thank you so much for reporting this issue. I've changed the nitro into minimal one that only sends KVM_NITRO_ATTACH_VM and KVM_NITRO_ATTACH_VCPUS ioctls. Guest freezes in this situation.
@Wenzel Do you have any ideas about it?

Wenzel · 2017-08-08T10:48:54Z

Hi,

from what i read, if you terminate right after nitro has attached and is listening, nitro should detach itself,
which implies to disable the traps.

If you stop nitro too soon, the traps will still be configured, and the VM will be frozen at the next event to be reported.

Nitro is supposed to deal with this (https://github.com/KVM-VMI/nitro/blob/master/nitro/nitro.py#L77)

        self.stop_listen()
        self.kvm_io.close()

Can you be more specific about when you send the CTRL C to nitro ?

aghamir · 2017-08-08T11:19:07Z

I've tested CTRL+C when events are reporting in userspace:

After doing CTRL+C, I cannot use GVM anymore.
I've tested this without CTRL+C by breaking for loop after reporting 10 events. Guest freezes in that situation.
After that I've changed nitro to send only attaches ioctl(without set traps and get event). In that situation we cannot use GVM too.

Wenzel · 2017-08-08T12:00:06Z

hi @aghamir ,

i guess you run ./main.py --stdout <linux_vm_name> ?

which is different from running the tests, but anyway.

When you send a CTRL-C, the VM shouldn't stay frozen.
Nitro exits nicely by disabling the traps and continuing the VM execution.

I just tested this on the master branch with a windows 7 vm, and it works.

-> You are on the linux branch developped by @Soft and i didn't verified all those changes yet.
Could you reproduce this on master ? (and swithcing to a windows VM)

aghamir · 2017-08-08T19:46:52Z

I test @Soft code on win7x64. It is ok and freezing does not occur in windows. However, KVM_NITRO_ATTACH_VM freezes linux GVM. Do you figure out why this happen to linux?

aghamir · 2017-08-09T07:15:06Z

Hi @Wenzel ,
I've noticed that anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
causes freeze in linux GVM.
https://github.com/KVM-VMI/kvm/blob/master/virt/kvm/kvm_main.c#L3356
Can you hint me to debug it?

Wenzel · 2017-10-09T15:20:52Z

hi @aghamir

sorry for the late response.

Do you still have those freezes ?

aghamir · 2017-10-09T16:41:31Z

Hi @Wenzel ,
I am not at upstream. However, I will test it on upstream ASAP.

aghamir · 2017-10-18T10:47:20Z

I test it. Everything is OK now. I didn't find what was the problem. However, your nitro is working perfectly now on ubuntu.

Wenzel · 2017-10-18T12:59:34Z

Great, thanks for the update !
I'm closing this issue.

If a cell has 'nbits' equal to a multiple of BITS_PER_BYTE the logic *p &= GENMASK((cell->nbits%BITS_PER_BYTE) - 1, 0); will become undefined behavior because nbits modulo BITS_PER_BYTE is 0, and we subtract one from that making a large number that is then shifted more than the number of bits that fit into an unsigned long. UBSAN reports this problem: UBSAN: shift-out-of-bounds in drivers/nvmem/core.c:1386:8 shift exponent 64 is too large for 64-bit type 'unsigned long' CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.15.0-rc3+ #9 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace+0x0/0x170 show_stack+0x24/0x30 dump_stack_lvl+0x64/0x7c dump_stack+0x18/0x38 ubsan_epilogue+0x10/0x54 __ubsan_handle_shift_out_of_bounds+0x180/0x194 __nvmem_cell_read+0x1ec/0x21c nvmem_cell_read+0x58/0x94 nvmem_cell_read_variable_common+0x4c/0xb0 nvmem_cell_read_variable_le_u32+0x40/0x100 a6xx_gpu_init+0x170/0x2f4 adreno_bind+0x174/0x284 component_bind_all+0xf0/0x264 msm_drm_bind+0x1d8/0x7a0 try_to_bring_up_master+0x164/0x1ac __component_add+0xbc/0x13c component_add+0x20/0x2c dp_display_probe+0x340/0x384 platform_probe+0xc0/0x100 really_probe+0x110/0x304 __driver_probe_device+0xb8/0x120 driver_probe_device+0x4c/0xfc __device_attach_driver+0xb0/0x128 bus_for_each_drv+0x90/0xdc __device_attach+0xc8/0x174 device_initial_probe+0x20/0x2c bus_probe_device+0x40/0xa4 deferred_probe_work_func+0x7c/0xb8 process_one_work+0x128/0x21c process_scheduled_works+0x40/0x54 worker_thread+0x1ec/0x2a8 kthread+0x138/0x158 ret_from_fork+0x10/0x20 Fix it by making sure there are any bits to mask out. Fixes: 69aba79 ("nvmem: Add a simple NVMEM framework for consumers") Cc: Douglas Anderson <[email protected]> Cc: [email protected] Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Wenzel closed this as completed Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guest Freeze #9

Guest Freeze #9

Soft commented Aug 7, 2017

aghamir commented Aug 8, 2017

Wenzel commented Aug 8, 2017 •

edited

Loading

aghamir commented Aug 8, 2017 •

edited

Loading

Wenzel commented Aug 8, 2017

aghamir commented Aug 8, 2017

aghamir commented Aug 9, 2017

Wenzel commented Oct 9, 2017

aghamir commented Oct 9, 2017 •

edited

Loading

aghamir commented Oct 18, 2017

Wenzel commented Oct 18, 2017

Guest Freeze #9

Guest Freeze #9

Comments

Soft commented Aug 7, 2017

Steps to reproduce

Expected Behavior

aghamir commented Aug 8, 2017

Wenzel commented Aug 8, 2017 • edited Loading

aghamir commented Aug 8, 2017 • edited Loading

Wenzel commented Aug 8, 2017

aghamir commented Aug 8, 2017

aghamir commented Aug 9, 2017

Wenzel commented Oct 9, 2017

aghamir commented Oct 9, 2017 • edited Loading

aghamir commented Oct 18, 2017

Wenzel commented Oct 18, 2017

Wenzel commented Aug 8, 2017 •

edited

Loading

aghamir commented Aug 8, 2017 •

edited

Loading

aghamir commented Oct 9, 2017 •

edited

Loading