Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guest Freeze #9

Closed
Soft opened this issue Aug 7, 2017 · 10 comments
Closed

Guest Freeze #9

Soft opened this issue Aug 7, 2017 · 10 comments

Comments

@Soft
Copy link

Soft commented Aug 7, 2017

KVM patches can cause monitored VMs to freeze.

Steps to reproduce

Unfortunatelly, the steps to reproduce this problem are little convoluted since they relly on my multiple backends fork of nitro. I am sure a more minimal test case can be constructed but I've yet to do so. However, asuming the user has setup Nitro:

  1. Start executing any of the linux tests: nose2 --log-level=debug -v test_linux.TestLinux.test_write
  2. After Nitro has attached to the VM, prematurely terminate the test by pressing CTRL-C
  3. Notice the extremely high CPU usage of the VM - the VM is now unresponsive
  4. After a while, VM's kernel will report the situation.

Expected Behavior

After terminating the test, the VM should continue execution normally.


I haven't really looked into the possible causes of this, but it seems clear that the problem is in the kernel. I am not sure if the problem is present with Windows guests. This problem has been previosly discussed in the Multiple Backends issue

@aghamir
Copy link

aghamir commented Aug 8, 2017

@Soft Thank you so much for reporting this issue. I've changed the nitro into minimal one that only sends KVM_NITRO_ATTACH_VM and KVM_NITRO_ATTACH_VCPUS ioctls. Guest freezes in this situation.
@Wenzel Do you have any ideas about it?

@Wenzel
Copy link
Member

Wenzel commented Aug 8, 2017

Hi,

from what i read, if you terminate right after nitro has attached and is listening, nitro should detach itself,
which implies to disable the traps.

If you stop nitro too soon, the traps will still be configured, and the VM will be frozen at the next event to be reported.

Nitro is supposed to deal with this (https://github.com/KVM-VMI/nitro/blob/master/nitro/nitro.py#L77)

        self.stop_listen()
        self.kvm_io.close()

Can you be more specific about when you send the CTRL C to nitro ?

@aghamir
Copy link

aghamir commented Aug 8, 2017

I've tested CTRL+C when events are reporting in userspace:
image
After doing CTRL+C, I cannot use GVM anymore.
I've tested this without CTRL+C by breaking for loop after reporting 10 events. Guest freezes in that situation.
After that I've changed nitro to send only attaches ioctl(without set traps and get event). In that situation we cannot use GVM too.

@Wenzel
Copy link
Member

Wenzel commented Aug 8, 2017

hi @aghamir ,

i guess you run ./main.py --stdout <linux_vm_name> ?

which is different from running the tests, but anyway.

When you send a CTRL-C, the VM shouldn't stay frozen.
Nitro exits nicely by disabling the traps and continuing the VM execution.

I just tested this on the master branch with a windows 7 vm, and it works.

-> You are on the linux branch developped by @Soft and i didn't verified all those changes yet.
Could you reproduce this on master ? (and swithcing to a windows VM)

@aghamir
Copy link

aghamir commented Aug 8, 2017

I test @Soft code on win7x64. It is ok and freezing does not occur in windows. However, KVM_NITRO_ATTACH_VM freezes linux GVM. Do you figure out why this happen to linux?

@aghamir
Copy link

aghamir commented Aug 9, 2017

Hi @Wenzel ,
I've noticed that anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
causes freeze in linux GVM.
https://github.com/KVM-VMI/kvm/blob/master/virt/kvm/kvm_main.c#L3356
Can you hint me to debug it?

@Wenzel
Copy link
Member

Wenzel commented Oct 9, 2017

hi @aghamir

sorry for the late response.

Do you still have those freezes ?

@aghamir
Copy link

aghamir commented Oct 9, 2017

Hi @Wenzel ,
I am not at upstream. However, I will test it on upstream ASAP.

@aghamir
Copy link

aghamir commented Oct 18, 2017

I test it. Everything is OK now. I didn't find what was the problem. However, your nitro is working perfectly now on ubuntu.

@Wenzel
Copy link
Member

Wenzel commented Oct 18, 2017

Great, thanks for the update !
I'm closing this issue.

@Wenzel Wenzel closed this as completed Oct 18, 2017
Wenzel pushed a commit that referenced this issue Nov 3, 2021
If a cell has 'nbits' equal to a multiple of BITS_PER_BYTE the logic

 *p &= GENMASK((cell->nbits%BITS_PER_BYTE) - 1, 0);

will become undefined behavior because nbits modulo BITS_PER_BYTE is 0, and we
subtract one from that making a large number that is then shifted more than the
number of bits that fit into an unsigned long.

UBSAN reports this problem:

 UBSAN: shift-out-of-bounds in drivers/nvmem/core.c:1386:8
 shift exponent 64 is too large for 64-bit type 'unsigned long'
 CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.15.0-rc3+ #9
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 Workqueue: events_unbound deferred_probe_work_func
 Call trace:
  dump_backtrace+0x0/0x170
  show_stack+0x24/0x30
  dump_stack_lvl+0x64/0x7c
  dump_stack+0x18/0x38
  ubsan_epilogue+0x10/0x54
  __ubsan_handle_shift_out_of_bounds+0x180/0x194
  __nvmem_cell_read+0x1ec/0x21c
  nvmem_cell_read+0x58/0x94
  nvmem_cell_read_variable_common+0x4c/0xb0
  nvmem_cell_read_variable_le_u32+0x40/0x100
  a6xx_gpu_init+0x170/0x2f4
  adreno_bind+0x174/0x284
  component_bind_all+0xf0/0x264
  msm_drm_bind+0x1d8/0x7a0
  try_to_bring_up_master+0x164/0x1ac
  __component_add+0xbc/0x13c
  component_add+0x20/0x2c
  dp_display_probe+0x340/0x384
  platform_probe+0xc0/0x100
  really_probe+0x110/0x304
  __driver_probe_device+0xb8/0x120
  driver_probe_device+0x4c/0xfc
  __device_attach_driver+0xb0/0x128
  bus_for_each_drv+0x90/0xdc
  __device_attach+0xc8/0x174
  device_initial_probe+0x20/0x2c
  bus_probe_device+0x40/0xa4
  deferred_probe_work_func+0x7c/0xb8
  process_one_work+0x128/0x21c
  process_scheduled_works+0x40/0x54
  worker_thread+0x1ec/0x2a8
  kthread+0x138/0x158
  ret_from_fork+0x10/0x20

Fix it by making sure there are any bits to mask out.

Fixes: 69aba79 ("nvmem: Add a simple NVMEM framework for consumers")
Cc: Douglas Anderson <[email protected]>
Cc: [email protected]
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants