-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maybe misunderstanding doc #4
Comments
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
commit 2d5a2f9 upstream. I see the following lockdep splat in the qcom pinctrl driver when attempting to suspend the device. WARNING: possible recursive locking detected 5.4.11 #3 Tainted: G W -------------------------------------------- cat/3074 is trying to acquire lock: ffffff81f49804c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 but task is already holding lock: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by cat/3074: #0: ffffff81f01d9420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4 #1: ffffff81bd7d2080 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc #2: ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc #3: ffffffe411a41d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348 #4: ffffff81f1c5e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c #5: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 stack backtrace: CPU: 5 PID: 3074 Comm: cat Tainted: G W 5.4.11 #3 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xc8/0x124 __lock_acquire+0x460/0x2388 lock_acquire+0x1cc/0x210 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 qpnpint_irq_set_wake+0x28/0x34 set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 pm8941_pwrkey_suspend+0x34/0x44 platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x310/0x41c dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x620 pm_suspend+0x210/0x348 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x15c/0x1fc __vfs_write+0x54/0x18c vfs_write+0xe4/0x1a4 ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_handler+0x7c/0x98 el0_svc+0x8/0xc Set a lockdep class when we map the irq so that irq_set_wake() doesn't warn about a lockdep bug that doesn't exist. Fixes: 12a9eea ("spmi: pmic-arb: convert to v2 irq interfaces to support hierarchical IRQ chips") Cc: Douglas Anderson <[email protected]> Cc: Brian Masney <[email protected]> Cc: Lina Iyer <[email protected]> Cc: Maulik Shah <[email protected]> Cc: Bjorn Andersson <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 1b710b1 ] Sergey didn't like the locking order, uart_port->lock -> tty_port->lock uart_write (uart_port->lock) __uart_start pl011_start_tx pl011_tx_chars uart_write_wakeup tty_port_tty_wakeup tty_port_default tty_port_tty_get (tty_port->lock) but those code is so old, and I have no clue how to de-couple it after checking other locks in the splat. There is an onging effort to make all printk() as deferred, so until that happens, workaround it for now as a short-term fix. LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$ 1000b:$TMPDIR/doio.f2.$$) WARNING: possible circular locking dependency detected ------------------------------------------------------ doio/49441 is trying to acquire lock: ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050 but task is already holding lock: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&pool->lock/1){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 __queue_work+0x4b4/0xa10 queue_work_on+0xac/0x11c tty_schedule_flip+0x84/0xbc tty_flip_buffer_push+0x1c/0x28 pty_write+0x98/0xd0 n_tty_write+0x450/0x60c tty_write+0x338/0x474 __vfs_write+0x88/0x214 vfs_write+0x12c/0x1a4 redirected_tty_write+0x90/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #3 (&(&port->lock)->rlock){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock_irqsave+0x7c/0x9c tty_port_tty_get+0x24/0x60 tty_port_default_wakeup+0x1c/0x3c tty_port_tty_wakeup+0x34/0x40 uart_write_wakeup+0x28/0x44 pl011_tx_chars+0x1b8/0x270 pl011_start_tx+0x24/0x70 __uart_start+0x5c/0x68 uart_write+0x164/0x1c8 do_output_char+0x33c/0x348 n_tty_write+0x4bc/0x60c tty_write+0x338/0x474 redirected_tty_write+0xc0/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #2 (&port_lock_key){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 pl011_console_write+0xec/0x2cc console_unlock+0x794/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 register_console+0x734/0x7b0 uart_add_one_port+0x734/0x834 pl011_register_port+0x6c/0xac sbsa_uart_probe+0x234/0x2ec platform_drv_probe+0xd4/0x124 really_probe+0x250/0x71c driver_probe_device+0xb4/0x200 __device_attach_driver+0xd8/0x188 bus_for_each_drv+0xbc/0x110 __device_attach+0x120/0x220 device_initial_probe+0x20/0x2c bus_probe_device+0x54/0x100 device_add+0xae8/0xc2c platform_device_add+0x278/0x3b8 platform_device_register_full+0x238/0x2ac acpi_create_platform_device+0x2dc/0x3a8 acpi_bus_attach+0x390/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_scan+0x7c/0xb0 acpi_scan_init+0xe4/0x304 acpi_init+0x100/0x114 do_one_initcall+0x348/0x6a0 do_initcall_level+0x190/0x1fc do_basic_setup+0x34/0x4c kernel_init_freeable+0x19c/0x260 kernel_init+0x18/0x338 ret_from_fork+0x10/0x18 -> #1 (console_owner){-...}: lock_acquire+0x320/0x360 console_lock_spinning_enable+0x6c/0x7c console_unlock+0x4f8/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 get_random_u64+0x1c4/0x1dc shuffle_pick_tail+0x40/0xac __free_one_page+0x424/0x710 free_one_page+0x70/0x120 __free_pages_ok+0x61c/0xa94 __free_pages_core+0x1bc/0x294 memblock_free_pages+0x38/0x48 __free_pages_memory+0xcc/0xfc __free_memory_core+0x70/0x78 free_low_memory_core_early+0x148/0x18c memblock_free_all+0x18/0x54 mem_init+0xb4/0x17c mm_init+0x14/0x38 start_kernel+0x19c/0x530 -> #0 (&(&zone->lock)->rlock){..-.}: validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 other info that might help us debug this: Chain exists of: &(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pool->lock/1); lock(&(&port->lock)->rlock); lock(&pool->lock/1); lock(&(&zone->lock)->rlock); *** DEADLOCK *** 4 locks held by doio/49441: #0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4 #1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0x2a8/0x300 [xfs] #2: ffff9000129f2390 (rcu_read_lock){....}, at: rcu_lock_acquire+0x8/0x38 #3: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 stack backtrace: CPU: 48 PID: 49441 Comm: doio Tainted: G W Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xe8/0x150 print_circular_bug+0x368/0x380 check_noncircular+0x28c/0x294 validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit c34f6dc ] I see the following lockdep splat in the qcom pinctrl driver when attempting to suspend the device. ============================================ WARNING: possible recursive locking detected 5.4.2 #2 Tainted: G S -------------------------------------------- cat/6536 is trying to acquire lock: ffffff814787ccc0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 but task is already holding lock: ffffff81436740c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by cat/6536: #0: ffffff8140e0c420 (sb_writers#7){.+.+}, at: vfs_write+0xc8/0x19c #1: ffffff8121eec480 (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x1f4 #2: ffffff8147cad668 (kn->count#263){.+.+}, at: kernfs_fop_write+0x130/0x1f4 #3: ffffffd011446000 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x354 #4: ffffff814302b970 (&dev->mutex){....}, at: __device_suspend+0x16c/0x420 #5: ffffff81436740c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 #6: ffffff81479b8c10 (&pctrl->lock){....}, at: msm_gpio_irq_set_wake+0x48/0x7c stack backtrace: CPU: 4 PID: 6536 Comm: cat Tainted: G S 5.4.2 #2 Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xdc/0x144 __lock_acquire+0x52c/0x2268 lock_acquire+0x1dc/0x220 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 msm_gpio_irq_set_wake+0x5c/0x7c set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 cros_ec_rtc_suspend+0x38/0x4c platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x314/0x420 dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x628 pm_suspend+0x214/0x354 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x158/0x1f4 __vfs_write+0x54/0x18c vfs_write+0xdc/0x19c ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x10 This is because the msm_gpio_irq_set_wake() function calls irq_set_irq_wake() as a backup in case the irq comes in during the path to idle. Given that we're calling irqchip functions from within an irqchip we need to set the lockdep class to be different for this child controller vs. the default one that the parent irqchip gets. This used to be done before this driver was converted to hierarchical irq domains in commit e35a6ae ("pinctrl/msm: Setup GPIO chip in hierarchy") via the gpiochip_irq_map() function. With hierarchical irq domains this function has been replaced by gpiochip_hierarchy_irq_domain_alloc(). Therefore, set the lockdep class like was done previously in the irq domain path so we can avoid this lockdep warning. Fixes: fdd61a0 ("gpio: Add support for hierarchical IRQ domains") Cc: Thierry Reding <[email protected]> Cc: Brian Masney <[email protected]> Cc: Lina Iyer <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Maulik Shah <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
commit 52e29e3 upstream. The only time we actually leave the path spinning is if we're truncating a small amount and don't actually free an extent, which is not a common occurrence. We have to set the path blocking in order to add the delayed ref anyway, so the first extent we find we set the path to blocking and stay blocking for the duration of the operation. With the upcoming file extent map stuff there will be another case that we have to have the path blocking, so just swap to blocking always. Note: this patch also fixes a warning after 28553fa ("Btrfs: fix race between shrinking truncate and fiemap") got merged that inserts extent locks around truncation so the path must not leave spinning locks after btrfs_search_slot. [70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565 [70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync [70.794863] 5 locks held by rsync/1141: [70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50 [70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100 [70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560 [70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160 [70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160 [70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2 [70.795362] Call Trace: [70.795374] dump_stack+0x71/0xa0 [70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6 [70.795459] kmem_cache_alloc+0x1d3/0x290 [70.795471] alloc_extent_state+0x22/0x1c0 [70.795544] __clear_extent_bit+0x3ba/0x580 [70.795557] ? _raw_spin_unlock_irq+0x24/0x30 [70.795569] btrfs_truncate_inode_items+0x339/0xe50 [70.795647] btrfs_evict_inode+0x269/0x540 [70.795659] ? dput.part.38+0x29/0x460 [70.795671] evict+0xcd/0x190 [70.795682] __dentry_kill+0xd6/0x180 [70.795754] dput.part.38+0x2ad/0x460 [70.795765] do_renameat2+0x3cb/0x540 [70.795777] __x64_sys_rename+0x1c/0x20 Reported-by: Dave Jones <[email protected]> Fixes: 28553fa ("Btrfs: fix race between shrinking truncate and fiemap") CC: [email protected] # 4.4+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add note ] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 064ff66 ] After bond_release(), netdev_update_lockdep_key() should be called. But both ioctl path and attribute path don't call netdev_update_lockdep_key(). This patch adds missing netdev_update_lockdep_key(). Test commands: ip link add bond0 type bond ip link add bond1 type bond ifenslave bond0 bond1 ifenslave -d bond0 bond1 ifenslave bond1 bond0 Splat looks like: [ 29.501182][ T1046] WARNING: possible circular locking dependency detected [ 29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700 [ 29.503442][ T1046] 5.5.0+ #322 Not tainted [ 29.503447][ T1046] ------------------------------------------------------ [ 29.504277][ T1039] softirqs last enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981 [ 29.505443][ T1046] ifenslave/1046 is trying to acquire lock: [ 29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0 [ 29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120 [ 29.511243][ T1046] [ 29.511243][ T1046] but task is already holding lock: [ 29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.514124][ T1046] [ 29.514124][ T1046] which lock already depends on the new lock. [ 29.514124][ T1046] [ 29.517297][ T1046] [ 29.517297][ T1046] the existing dependency chain (in reverse order) is: [ 29.518231][ T1046] [ 29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}: [ 29.519076][ T1046] _raw_spin_lock+0x30/0x70 [ 29.519588][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.520208][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.520862][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.521640][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.522438][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.523251][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.524082][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.524959][ T1046] kernfs_fop_write+0x276/0x410 [ 29.525620][ T1046] vfs_write+0x197/0x4a0 [ 29.526218][ T1046] ksys_write+0x141/0x1d0 [ 29.526818][ T1046] do_syscall_64+0x99/0x4f0 [ 29.527430][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.528265][ T1046] [ 29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}: [ 29.529272][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.529935][ T1046] lock_acquire+0x164/0x3b0 [ 29.530638][ T1046] _raw_spin_lock+0x30/0x70 [ 29.531187][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.531790][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.532451][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.533163][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.533789][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.534595][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.535500][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.536379][ T1046] kernfs_fop_write+0x276/0x410 [ 29.537057][ T1046] vfs_write+0x197/0x4a0 [ 29.537640][ T1046] ksys_write+0x141/0x1d0 [ 29.538251][ T1046] do_syscall_64+0x99/0x4f0 [ 29.538870][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.539659][ T1046] [ 29.539659][ T1046] other info that might help us debug this: [ 29.539659][ T1046] [ 29.540953][ T1046] Possible unsafe locking scenario: [ 29.540953][ T1046] [ 29.541883][ T1046] CPU0 CPU1 [ 29.542540][ T1046] ---- ---- [ 29.543209][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.543880][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.544873][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.545863][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.546525][ T1046] [ 29.546525][ T1046] *** DEADLOCK *** [ 29.546525][ T1046] [ 29.547542][ T1046] 5 locks held by ifenslave/1046: [ 29.548196][ T1046] #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0 [ 29.549248][ T1046] #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410 [ 29.550343][ T1046] #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410 [ 29.551575][ T1046] #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding] [ 29.552819][ T1046] #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.554175][ T1046] [ 29.554175][ T1046] stack backtrace: [ 29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322 [ 29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 29.557064][ T1046] Call Trace: [ 29.557504][ T1046] dump_stack+0x96/0xdb [ 29.558054][ T1046] check_noncircular+0x371/0x450 [ 29.558723][ T1046] ? print_circular_bug.isra.35+0x310/0x310 [ 29.559486][ T1046] ? hlock_class+0x130/0x130 [ 29.560100][ T1046] ? __lock_acquire+0x2d8d/0x3de0 [ 29.560761][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.561366][ T1046] ? register_lock_class+0x14d0/0x14d0 [ 29.562045][ T1046] ? find_held_lock+0x39/0x1d0 [ 29.562641][ T1046] lock_acquire+0x164/0x3b0 [ 29.563199][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.563872][ T1046] _raw_spin_lock+0x30/0x70 [ 29.564464][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.565146][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.565793][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.566487][ T1046] ? bond_update_slave_arr+0x940/0x940 [bonding] [ 29.567279][ T1046] ? bstr_printf+0xc20/0xc20 [ 29.567857][ T1046] ? stack_trace_consume_entry+0x160/0x160 [ 29.568614][ T1046] ? deactivate_slab.isra.77+0x2c5/0x800 [ 29.569320][ T1046] ? check_chain_key+0x236/0x5d0 [ 29.569939][ T1046] ? sscanf+0x93/0xc0 [ 29.570442][ T1046] ? vsscanf+0x1e20/0x1e20 [ 29.571003][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ ... ] Fixes: ab92d68 ("net: core: add generic lockdep keys") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
commit dad2aff upstream. If scatter-gather operation is allowed, a large USB request is split into multiple TRBs. For preparing TRBs for sg list, driver iterates over the list and creates TRB for each sg and mark the chain bit to false for the last sg. The current IOMMU driver is clubbing the list of sgs which shares a page boundary into one and giving it to USB driver. With this the number of sgs mapped it not equal to the the number of sgs passed. Because of this USB driver is not marking the chain bit to false since it couldn't iterate to the last sg. This patch addresses this issue by marking the chain bit to false if it is the last mapped sg. At a practical level, this patch resolves USB transfer stalls seen with adb on dwc3 based db845c, pixel3 and other qcom hardware after functionfs gadget added scatter-gather support around v4.20. Credit also to Anurag Kumar Vulisha <[email protected]> who implemented a very similar fix to this issue. Cc: Felipe Balbi <[email protected]> Cc: Yang Fei <[email protected]> Cc: Thinh Nguyen <[email protected]> Cc: Tejas Joglekar <[email protected]> Cc: Andrzej Pietrasiewicz <[email protected]> Cc: Jack Pham <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Greg KH <[email protected]> Cc: Linux USB List <[email protected]> Cc: stable <[email protected]> #4.20+ Signed-off-by: Pratham Pratap <[email protected]> [jstultz: Slight tweak to remove sg_is_last() usage, reworked commit message, minor comment tweak] Signed-off-by: John Stultz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 102210f ] rmnet_get_port() internally calls rcu_dereference_rtnl(), which checks RTNL. But rmnet_get_port() could be called by packet path. The packet path is not protected by RTNL. So, the suspicious RCU usage problem occurs. Test commands: modprobe rmnet ip netns add nst ip link add veth0 type veth peer name veth1 ip link set veth1 netns nst ip link add rmnet0 link veth0 type rmnet mux_id 1 ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1 ip netns exec nst ip link set veth1 up ip netns exec nst ip link set rmnet1 up ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1 ip link set veth0 up ip link set rmnet0 up ip a a 192.168.100.1/24 dev rmnet0 ping 192.168.100.2 Splat looks like: [ 146.630958][ T1174] WARNING: suspicious RCU usage [ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted [ 146.632387][ T1174] ----------------------------- [ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() ! [ 146.634742][ T1174] [ 146.634742][ T1174] other info that might help us debug this: [ 146.634742][ T1174] [ 146.645992][ T1174] [ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1 [ 146.646937][ T1174] 5 locks held by ping/1174: [ 146.647609][ T1174] #0: ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980 [ 146.662463][ T1174] #1: ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150 [ 146.671696][ T1174] #2: ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940 [ 146.673064][ T1174] #3: ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150 [ 146.690358][ T1174] #4: ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020 [ 146.699875][ T1174] [ 146.699875][ T1174] stack backtrace: [ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447 [ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 146.706565][ T1174] Call Trace: [ 146.707102][ T1174] dump_stack+0x96/0xdb [ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet] [ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet] [ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020 [ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet] [ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740 [ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020 [ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0 [ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0 [ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940 [ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0 [ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940 [ ... ] Fixes: ceed73a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 6c5d911 ] journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 1b710b1 ] Sergey didn't like the locking order, uart_port->lock -> tty_port->lock uart_write (uart_port->lock) __uart_start pl011_start_tx pl011_tx_chars uart_write_wakeup tty_port_tty_wakeup tty_port_default tty_port_tty_get (tty_port->lock) but those code is so old, and I have no clue how to de-couple it after checking other locks in the splat. There is an onging effort to make all printk() as deferred, so until that happens, workaround it for now as a short-term fix. LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$ 1000b:$TMPDIR/doio.f2.$$) WARNING: possible circular locking dependency detected ------------------------------------------------------ doio/49441 is trying to acquire lock: ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050 but task is already holding lock: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&pool->lock/1){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 __queue_work+0x4b4/0xa10 queue_work_on+0xac/0x11c tty_schedule_flip+0x84/0xbc tty_flip_buffer_push+0x1c/0x28 pty_write+0x98/0xd0 n_tty_write+0x450/0x60c tty_write+0x338/0x474 __vfs_write+0x88/0x214 vfs_write+0x12c/0x1a4 redirected_tty_write+0x90/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #3 (&(&port->lock)->rlock){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock_irqsave+0x7c/0x9c tty_port_tty_get+0x24/0x60 tty_port_default_wakeup+0x1c/0x3c tty_port_tty_wakeup+0x34/0x40 uart_write_wakeup+0x28/0x44 pl011_tx_chars+0x1b8/0x270 pl011_start_tx+0x24/0x70 __uart_start+0x5c/0x68 uart_write+0x164/0x1c8 do_output_char+0x33c/0x348 n_tty_write+0x4bc/0x60c tty_write+0x338/0x474 redirected_tty_write+0xc0/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #2 (&port_lock_key){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 pl011_console_write+0xec/0x2cc console_unlock+0x794/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 register_console+0x734/0x7b0 uart_add_one_port+0x734/0x834 pl011_register_port+0x6c/0xac sbsa_uart_probe+0x234/0x2ec platform_drv_probe+0xd4/0x124 really_probe+0x250/0x71c driver_probe_device+0xb4/0x200 __device_attach_driver+0xd8/0x188 bus_for_each_drv+0xbc/0x110 __device_attach+0x120/0x220 device_initial_probe+0x20/0x2c bus_probe_device+0x54/0x100 device_add+0xae8/0xc2c platform_device_add+0x278/0x3b8 platform_device_register_full+0x238/0x2ac acpi_create_platform_device+0x2dc/0x3a8 acpi_bus_attach+0x390/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_scan+0x7c/0xb0 acpi_scan_init+0xe4/0x304 acpi_init+0x100/0x114 do_one_initcall+0x348/0x6a0 do_initcall_level+0x190/0x1fc do_basic_setup+0x34/0x4c kernel_init_freeable+0x19c/0x260 kernel_init+0x18/0x338 ret_from_fork+0x10/0x18 -> #1 (console_owner){-...}: lock_acquire+0x320/0x360 console_lock_spinning_enable+0x6c/0x7c console_unlock+0x4f8/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 get_random_u64+0x1c4/0x1dc shuffle_pick_tail+0x40/0xac __free_one_page+0x424/0x710 free_one_page+0x70/0x120 __free_pages_ok+0x61c/0xa94 __free_pages_core+0x1bc/0x294 memblock_free_pages+0x38/0x48 __free_pages_memory+0xcc/0xfc __free_memory_core+0x70/0x78 free_low_memory_core_early+0x148/0x18c memblock_free_all+0x18/0x54 mem_init+0xb4/0x17c mm_init+0x14/0x38 start_kernel+0x19c/0x530 -> #0 (&(&zone->lock)->rlock){..-.}: validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 other info that might help us debug this: Chain exists of: &(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pool->lock/1); lock(&(&port->lock)->rlock); lock(&pool->lock/1); lock(&(&zone->lock)->rlock); *** DEADLOCK *** 4 locks held by doio/49441: #0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4 #1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0x2a8/0x300 [xfs] #2: ffff9000129f2390 (rcu_read_lock){....}, at: rcu_lock_acquire+0x8/0x38 #3: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 stack backtrace: CPU: 48 PID: 49441 Comm: doio Tainted: G W Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xe8/0x150 print_circular_bug+0x368/0x380 check_noncircular+0x28c/0x294 validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
commit dad2aff upstream. If scatter-gather operation is allowed, a large USB request is split into multiple TRBs. For preparing TRBs for sg list, driver iterates over the list and creates TRB for each sg and mark the chain bit to false for the last sg. The current IOMMU driver is clubbing the list of sgs which shares a page boundary into one and giving it to USB driver. With this the number of sgs mapped it not equal to the the number of sgs passed. Because of this USB driver is not marking the chain bit to false since it couldn't iterate to the last sg. This patch addresses this issue by marking the chain bit to false if it is the last mapped sg. At a practical level, this patch resolves USB transfer stalls seen with adb on dwc3 based db845c, pixel3 and other qcom hardware after functionfs gadget added scatter-gather support around v4.20. Credit also to Anurag Kumar Vulisha <[email protected]> who implemented a very similar fix to this issue. Cc: Felipe Balbi <[email protected]> Cc: Yang Fei <[email protected]> Cc: Thinh Nguyen <[email protected]> Cc: Tejas Joglekar <[email protected]> Cc: Andrzej Pietrasiewicz <[email protected]> Cc: Jack Pham <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Greg KH <[email protected]> Cc: Linux USB List <[email protected]> Cc: stable <[email protected]> #4.20+ Signed-off-by: Pratham Pratap <[email protected]> [jstultz: Slight tweak to remove sg_is_last() usage, reworked commit message, minor comment tweak] Signed-off-by: John Stultz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 102210f ] rmnet_get_port() internally calls rcu_dereference_rtnl(), which checks RTNL. But rmnet_get_port() could be called by packet path. The packet path is not protected by RTNL. So, the suspicious RCU usage problem occurs. Test commands: modprobe rmnet ip netns add nst ip link add veth0 type veth peer name veth1 ip link set veth1 netns nst ip link add rmnet0 link veth0 type rmnet mux_id 1 ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1 ip netns exec nst ip link set veth1 up ip netns exec nst ip link set rmnet1 up ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1 ip link set veth0 up ip link set rmnet0 up ip a a 192.168.100.1/24 dev rmnet0 ping 192.168.100.2 Splat looks like: [ 146.630958][ T1174] WARNING: suspicious RCU usage [ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted [ 146.632387][ T1174] ----------------------------- [ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() ! [ 146.634742][ T1174] [ 146.634742][ T1174] other info that might help us debug this: [ 146.634742][ T1174] [ 146.645992][ T1174] [ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1 [ 146.646937][ T1174] 5 locks held by ping/1174: [ 146.647609][ T1174] #0: ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980 [ 146.662463][ T1174] #1: ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150 [ 146.671696][ T1174] #2: ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940 [ 146.673064][ T1174] #3: ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150 [ 146.690358][ T1174] #4: ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020 [ 146.699875][ T1174] [ 146.699875][ T1174] stack backtrace: [ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447 [ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 146.706565][ T1174] Call Trace: [ 146.707102][ T1174] dump_stack+0x96/0xdb [ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet] [ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet] [ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020 [ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet] [ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740 [ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020 [ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0 [ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0 [ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940 [ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0 [ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940 [ ... ] Fixes: ceed73a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 6c5d911 ] journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 1b710b1 ] Sergey didn't like the locking order, uart_port->lock -> tty_port->lock uart_write (uart_port->lock) __uart_start pl011_start_tx pl011_tx_chars uart_write_wakeup tty_port_tty_wakeup tty_port_default tty_port_tty_get (tty_port->lock) but those code is so old, and I have no clue how to de-couple it after checking other locks in the splat. There is an onging effort to make all printk() as deferred, so until that happens, workaround it for now as a short-term fix. LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$ 1000b:$TMPDIR/doio.f2.$$) WARNING: possible circular locking dependency detected ------------------------------------------------------ doio/49441 is trying to acquire lock: ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050 but task is already holding lock: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&pool->lock/1){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 __queue_work+0x4b4/0xa10 queue_work_on+0xac/0x11c tty_schedule_flip+0x84/0xbc tty_flip_buffer_push+0x1c/0x28 pty_write+0x98/0xd0 n_tty_write+0x450/0x60c tty_write+0x338/0x474 __vfs_write+0x88/0x214 vfs_write+0x12c/0x1a4 redirected_tty_write+0x90/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #3 (&(&port->lock)->rlock){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock_irqsave+0x7c/0x9c tty_port_tty_get+0x24/0x60 tty_port_default_wakeup+0x1c/0x3c tty_port_tty_wakeup+0x34/0x40 uart_write_wakeup+0x28/0x44 pl011_tx_chars+0x1b8/0x270 pl011_start_tx+0x24/0x70 __uart_start+0x5c/0x68 uart_write+0x164/0x1c8 do_output_char+0x33c/0x348 n_tty_write+0x4bc/0x60c tty_write+0x338/0x474 redirected_tty_write+0xc0/0xdc do_loop_readv_writev+0x140/0x180 do_iter_write+0xe0/0x10c vfs_writev+0x134/0x1cc do_writev+0xbc/0x130 __arm64_sys_writev+0x58/0x8c el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 -> #2 (&port_lock_key){-.-.}: lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 pl011_console_write+0xec/0x2cc console_unlock+0x794/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 register_console+0x734/0x7b0 uart_add_one_port+0x734/0x834 pl011_register_port+0x6c/0xac sbsa_uart_probe+0x234/0x2ec platform_drv_probe+0xd4/0x124 really_probe+0x250/0x71c driver_probe_device+0xb4/0x200 __device_attach_driver+0xd8/0x188 bus_for_each_drv+0xbc/0x110 __device_attach+0x120/0x220 device_initial_probe+0x20/0x2c bus_probe_device+0x54/0x100 device_add+0xae8/0xc2c platform_device_add+0x278/0x3b8 platform_device_register_full+0x238/0x2ac acpi_create_platform_device+0x2dc/0x3a8 acpi_bus_attach+0x390/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_attach+0x108/0x3cc acpi_bus_scan+0x7c/0xb0 acpi_scan_init+0xe4/0x304 acpi_init+0x100/0x114 do_one_initcall+0x348/0x6a0 do_initcall_level+0x190/0x1fc do_basic_setup+0x34/0x4c kernel_init_freeable+0x19c/0x260 kernel_init+0x18/0x338 ret_from_fork+0x10/0x18 -> #1 (console_owner){-...}: lock_acquire+0x320/0x360 console_lock_spinning_enable+0x6c/0x7c console_unlock+0x4f8/0x96c vprintk_emit+0x260/0x31c vprintk_default+0x54/0x7c vprintk_func+0x218/0x254 printk+0x7c/0xa4 get_random_u64+0x1c4/0x1dc shuffle_pick_tail+0x40/0xac __free_one_page+0x424/0x710 free_one_page+0x70/0x120 __free_pages_ok+0x61c/0xa94 __free_pages_core+0x1bc/0x294 memblock_free_pages+0x38/0x48 __free_pages_memory+0xcc/0xfc __free_memory_core+0x70/0x78 free_low_memory_core_early+0x148/0x18c memblock_free_all+0x18/0x54 mem_init+0xb4/0x17c mm_init+0x14/0x38 start_kernel+0x19c/0x530 -> #0 (&(&zone->lock)->rlock){..-.}: validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 other info that might help us debug this: Chain exists of: &(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pool->lock/1); lock(&(&port->lock)->rlock); lock(&pool->lock/1); lock(&(&zone->lock)->rlock); *** DEADLOCK *** 4 locks held by doio/49441: #0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4 #1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0x2a8/0x300 [xfs] #2: ffff9000129f2390 (rcu_read_lock){....}, at: rcu_lock_acquire+0x8/0x38 #3: 60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0 stack backtrace: CPU: 48 PID: 49441 Comm: doio Tainted: G W Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xe8/0x150 print_circular_bug+0x368/0x380 check_noncircular+0x28c/0x294 validate_chain+0xf6c/0x2e2c __lock_acquire+0x868/0xc2c lock_acquire+0x320/0x360 _raw_spin_lock+0x64/0x80 rmqueue+0x138/0x2050 get_page_from_freelist+0x474/0x688 __alloc_pages_nodemask+0x3b4/0x18dc alloc_pages_current+0xd0/0xe0 alloc_slab_page+0x2b4/0x5e0 new_slab+0xc8/0x6bc ___slab_alloc+0x3b8/0x640 kmem_cache_alloc+0x4b4/0x588 __debug_object_init+0x778/0x8b4 debug_object_init_on_stack+0x40/0x50 start_flush_work+0x16c/0x3f0 __flush_work+0xb8/0x124 flush_work+0x20/0x30 xlog_cil_force_lsn+0x88/0x204 [xfs] xfs_log_force_lsn+0x128/0x1b8 [xfs] xfs_file_fsync+0x3c4/0x488 [xfs] vfs_fsync_range+0xb0/0xd0 generic_write_sync+0x80/0xa0 [xfs] xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs] xfs_file_write_iter+0x1a0/0x218 [xfs] __vfs_write+0x1cc/0x214 vfs_write+0x12c/0x1a4 ksys_write+0xb0/0x120 __arm64_sys_write+0x54/0x88 el0_svc_handler+0x170/0x240 el0_sync_handler+0x150/0x250 el0_sync+0x164/0x180 Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
Mar 27, 2020
[ Upstream commit 6c5d911 ] journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
[ Upstream commit a866759 ] This reverts commit 64e62bd. This commit ends up causing some lockdep splats due to trying to grab the payload lock while holding the mgr's lock: [ 54.010099] [ 54.011765] ====================================================== [ 54.018670] WARNING: possible circular locking dependency detected [ 54.025577] 5.5.0-rc6-02274-g77381c23ee63 #47 Not tainted [ 54.031610] ------------------------------------------------------ [ 54.038516] kworker/1:6/1040 is trying to acquire lock: [ 54.044354] ffff888272af3228 (&mgr->payload_lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.054957] [ 54.054957] but task is already holding lock: [ 54.061473] ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.071193] [ 54.071193] which lock already depends on the new lock. [ 54.071193] [ 54.080334] [ 54.080334] the existing dependency chain (in reverse order) is: [ 54.088697] [ 54.088697] -> #1 (&mgr->lock){+.+.}: [ 54.094440] __mutex_lock+0xc3/0x498 [ 54.099015] drm_dp_mst_topology_get_port_validated+0x25/0x80 [ 54.106018] drm_dp_update_payload_part1+0xa2/0x2e2 [ 54.112051] intel_mst_pre_enable_dp+0x144/0x18f [ 54.117791] intel_encoders_pre_enable+0x63/0x70 [ 54.123532] hsw_crtc_enable+0xa1/0x722 [ 54.128396] intel_update_crtc+0x50/0x194 [ 54.133455] skl_commit_modeset_enables+0x40c/0x540 [ 54.139485] intel_atomic_commit_tail+0x5f7/0x130d [ 54.145418] intel_atomic_commit+0x2c8/0x2d8 [ 54.150770] drm_atomic_helper_set_config+0x5a/0x70 [ 54.156801] drm_mode_setcrtc+0x2ab/0x833 [ 54.161862] drm_ioctl+0x2e5/0x424 [ 54.166242] vfs_ioctl+0x21/0x2f [ 54.170426] do_vfs_ioctl+0x5fb/0x61e [ 54.175096] ksys_ioctl+0x55/0x75 [ 54.179377] __x64_sys_ioctl+0x1a/0x1e [ 54.184146] do_syscall_64+0x5c/0x6d [ 54.188721] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.194946] [ 54.194946] -> #0 (&mgr->payload_lock){+.+.}: [ 54.201463] [ 54.201463] other info that might help us debug this: [ 54.201463] [ 54.210410] Possible unsafe locking scenario: [ 54.210410] [ 54.217025] CPU0 CPU1 [ 54.222082] ---- ---- [ 54.227138] lock(&mgr->lock); [ 54.230643] lock(&mgr->payload_lock); [ 54.237742] lock(&mgr->lock); [ 54.244062] lock(&mgr->payload_lock); [ 54.248346] [ 54.248346] *** DEADLOCK *** [ 54.248346] [ 54.254959] 7 locks held by kworker/1:6/1040: [ 54.259822] #0: ffff888275c4f528 ((wq_completion)events){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.269451] #1: ffffc9000119beb0 ((work_completion)(&(&dev_priv->hotplug.hotplug_work)->work)){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.282768] #2: ffff888272a403f0 (&dev->mode_config.mutex){+.+.}, at: i915_hotplug_work_func+0x4b/0x2be [ 54.293368] #3: ffffffff824fc6c0 (drm_connector_list_iter){.+.+}, at: i915_hotplug_work_func+0x17e/0x2be [ 54.304061] #4: ffffc9000119bc58 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x40/0xfd [ 54.314855] #5: ffff888272a40470 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x74/0xe2 [ 54.324385] #6: ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.334597] [ 54.334597] stack backtrace: [ 54.339464] CPU: 1 PID: 1040 Comm: kworker/1:6 Not tainted 5.5.0-rc6-02274-g77381c23ee63 #47 [ 54.348893] Hardware name: Google Fizz/Fizz, BIOS Google_Fizz.10139.39.0 01/04/2018 [ 54.357451] Workqueue: events i915_hotplug_work_func [ 54.362995] Call Trace: [ 54.365724] dump_stack+0x71/0x9c [ 54.369427] check_noncircular+0x91/0xbc [ 54.373809] ? __lock_acquire+0xc9e/0xf66 [ 54.378286] ? __lock_acquire+0xc9e/0xf66 [ 54.382763] ? lock_acquire+0x175/0x1ac [ 54.387048] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.393177] ? __mutex_lock+0xc3/0x498 [ 54.397362] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.403492] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.409620] ? drm_dp_dpcd_access+0xd9/0x101 [ 54.414390] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.420517] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.426645] ? intel_digital_port_connected+0x34d/0x35c [ 54.432482] ? intel_dp_detect+0x227/0x44e [ 54.437056] ? ww_mutex_lock+0x49/0x9a [ 54.441242] ? drm_helper_probe_detect_ctx+0x75/0xfd [ 54.446789] ? intel_encoder_hotplug+0x4b/0x97 [ 54.451752] ? intel_ddi_hotplug+0x61/0x2e0 [ 54.456423] ? mark_held_locks+0x53/0x68 [ 54.460803] ? _raw_spin_unlock_irqrestore+0x3a/0x51 [ 54.466347] ? lockdep_hardirqs_on+0x187/0x1a4 [ 54.471310] ? drm_connector_list_iter_next+0x89/0x9a [ 54.476953] ? i915_hotplug_work_func+0x206/0x2be [ 54.482208] ? worker_thread+0x4d5/0x6e2 [ 54.486587] ? worker_thread+0x455/0x6e2 [ 54.490966] ? queue_work_on+0x64/0x64 [ 54.495151] ? kthread+0x1e9/0x1f1 [ 54.498946] ? queue_work_on+0x64/0x64 [ 54.503130] ? kthread_unpark+0x5e/0x5e [ 54.507413] ? ret_from_fork+0x3a/0x50 The proper fix for this is probably cleanup the VCPI allocations when we're enabling the topology, or on the first payload allocation. For now though, let's just revert. Signed-off-by: Lyude Paul <[email protected]> Fixes: 64e62bd ("drm/dp_mst: Remove VCPI while disabling topology mgr") Cc: Sean Paul <[email protected]> Cc: Wayne Lin <[email protected]> Reviewed-by: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 56df70a upstream. find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [[email protected]: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add the conventional braces, replace /** with /*] Fixes: e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yang Shi <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: Markus Elfring <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 55b3209 upstream. For skcipher algorithms, the input, output HW S/G tables look like this: [IV, src][dst, IV] Now, we can have 2 conditions here: - there is no IV; - src and dst are equal (in-place encryption) and scattered and the error is an "off-by-one" in the HW S/G table. This issue was seen with KASAN: BUG: KASAN: slab-out-of-bounds in skcipher_edesc_alloc+0x95c/0x1018 Read of size 4 at addr ffff000022a02958 by task cryptomgr_test/321 CPU: 2 PID: 321 Comm: cryptomgr_test Not tainted 5.6.0-rc1-00165-ge4ef8383-dirty #4 Hardware name: LS1046A RDB Board (DT) Call trace: dump_backtrace+0x0/0x260 show_stack+0x14/0x20 dump_stack+0xe8/0x144 print_address_description.isra.11+0x64/0x348 __kasan_report+0x11c/0x230 kasan_report+0xc/0x18 __asan_load4+0x90/0xb0 skcipher_edesc_alloc+0x95c/0x1018 skcipher_encrypt+0x84/0x150 crypto_skcipher_encrypt+0x50/0x68 test_skcipher_vec_cfg+0x4d4/0xc10 test_skcipher_vec+0x178/0x1d8 alg_test_skcipher+0xec/0x230 alg_test.part.44+0x114/0x4a0 alg_test+0x1c/0x60 cryptomgr_test+0x34/0x58 kthread+0x1b8/0x1c0 ret_from_fork+0x10/0x18 Allocated by task 321: save_stack+0x24/0xb0 __kasan_kmalloc.isra.10+0xc4/0xe0 kasan_kmalloc+0xc/0x18 __kmalloc+0x178/0x2b8 skcipher_edesc_alloc+0x21c/0x1018 skcipher_encrypt+0x84/0x150 crypto_skcipher_encrypt+0x50/0x68 test_skcipher_vec_cfg+0x4d4/0xc10 test_skcipher_vec+0x178/0x1d8 alg_test_skcipher+0xec/0x230 alg_test.part.44+0x114/0x4a0 alg_test+0x1c/0x60 cryptomgr_test+0x34/0x58 kthread+0x1b8/0x1c0 ret_from_fork+0x10/0x18 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff000022a02800 which belongs to the cache dma-kmalloc-512 of size 512 The buggy address is located 344 bytes inside of 512-byte region [ffff000022a02800, ffff000022a02a00) The buggy address belongs to the page: page:fffffe00006a8000 refcount:1 mapcount:0 mapping:ffff00093200c400 index:0x0 compound_mapcount: 0 flags: 0xffff00000010200(slab|head) raw: 0ffff00000010200 dead000000000100 dead000000000122 ffff00093200c400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff000022a02800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff000022a02880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff000022a02900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc ^ ffff000022a02980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff000022a02a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 334d37c ("crypto: caam - update IV using HW support") Cc: <[email protected]> # v5.3+ Signed-off-by: Iuliana Prodan <[email protected]> Reviewed-by: Horia Geantă <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
[ Upstream commit a866759 ] This reverts commit 64e62bd. This commit ends up causing some lockdep splats due to trying to grab the payload lock while holding the mgr's lock: [ 54.010099] [ 54.011765] ====================================================== [ 54.018670] WARNING: possible circular locking dependency detected [ 54.025577] 5.5.0-rc6-02274-g77381c23ee63 #47 Not tainted [ 54.031610] ------------------------------------------------------ [ 54.038516] kworker/1:6/1040 is trying to acquire lock: [ 54.044354] ffff888272af3228 (&mgr->payload_lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.054957] [ 54.054957] but task is already holding lock: [ 54.061473] ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.071193] [ 54.071193] which lock already depends on the new lock. [ 54.071193] [ 54.080334] [ 54.080334] the existing dependency chain (in reverse order) is: [ 54.088697] [ 54.088697] -> #1 (&mgr->lock){+.+.}: [ 54.094440] __mutex_lock+0xc3/0x498 [ 54.099015] drm_dp_mst_topology_get_port_validated+0x25/0x80 [ 54.106018] drm_dp_update_payload_part1+0xa2/0x2e2 [ 54.112051] intel_mst_pre_enable_dp+0x144/0x18f [ 54.117791] intel_encoders_pre_enable+0x63/0x70 [ 54.123532] hsw_crtc_enable+0xa1/0x722 [ 54.128396] intel_update_crtc+0x50/0x194 [ 54.133455] skl_commit_modeset_enables+0x40c/0x540 [ 54.139485] intel_atomic_commit_tail+0x5f7/0x130d [ 54.145418] intel_atomic_commit+0x2c8/0x2d8 [ 54.150770] drm_atomic_helper_set_config+0x5a/0x70 [ 54.156801] drm_mode_setcrtc+0x2ab/0x833 [ 54.161862] drm_ioctl+0x2e5/0x424 [ 54.166242] vfs_ioctl+0x21/0x2f [ 54.170426] do_vfs_ioctl+0x5fb/0x61e [ 54.175096] ksys_ioctl+0x55/0x75 [ 54.179377] __x64_sys_ioctl+0x1a/0x1e [ 54.184146] do_syscall_64+0x5c/0x6d [ 54.188721] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.194946] [ 54.194946] -> #0 (&mgr->payload_lock){+.+.}: [ 54.201463] [ 54.201463] other info that might help us debug this: [ 54.201463] [ 54.210410] Possible unsafe locking scenario: [ 54.210410] [ 54.217025] CPU0 CPU1 [ 54.222082] ---- ---- [ 54.227138] lock(&mgr->lock); [ 54.230643] lock(&mgr->payload_lock); [ 54.237742] lock(&mgr->lock); [ 54.244062] lock(&mgr->payload_lock); [ 54.248346] [ 54.248346] *** DEADLOCK *** [ 54.248346] [ 54.254959] 7 locks held by kworker/1:6/1040: [ 54.259822] #0: ffff888275c4f528 ((wq_completion)events){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.269451] #1: ffffc9000119beb0 ((work_completion)(&(&dev_priv->hotplug.hotplug_work)->work)){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.282768] #2: ffff888272a403f0 (&dev->mode_config.mutex){+.+.}, at: i915_hotplug_work_func+0x4b/0x2be [ 54.293368] #3: ffffffff824fc6c0 (drm_connector_list_iter){.+.+}, at: i915_hotplug_work_func+0x17e/0x2be [ 54.304061] #4: ffffc9000119bc58 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x40/0xfd [ 54.314855] #5: ffff888272a40470 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x74/0xe2 [ 54.324385] #6: ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.334597] [ 54.334597] stack backtrace: [ 54.339464] CPU: 1 PID: 1040 Comm: kworker/1:6 Not tainted 5.5.0-rc6-02274-g77381c23ee63 #47 [ 54.348893] Hardware name: Google Fizz/Fizz, BIOS Google_Fizz.10139.39.0 01/04/2018 [ 54.357451] Workqueue: events i915_hotplug_work_func [ 54.362995] Call Trace: [ 54.365724] dump_stack+0x71/0x9c [ 54.369427] check_noncircular+0x91/0xbc [ 54.373809] ? __lock_acquire+0xc9e/0xf66 [ 54.378286] ? __lock_acquire+0xc9e/0xf66 [ 54.382763] ? lock_acquire+0x175/0x1ac [ 54.387048] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.393177] ? __mutex_lock+0xc3/0x498 [ 54.397362] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.403492] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.409620] ? drm_dp_dpcd_access+0xd9/0x101 [ 54.414390] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.420517] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.426645] ? intel_digital_port_connected+0x34d/0x35c [ 54.432482] ? intel_dp_detect+0x227/0x44e [ 54.437056] ? ww_mutex_lock+0x49/0x9a [ 54.441242] ? drm_helper_probe_detect_ctx+0x75/0xfd [ 54.446789] ? intel_encoder_hotplug+0x4b/0x97 [ 54.451752] ? intel_ddi_hotplug+0x61/0x2e0 [ 54.456423] ? mark_held_locks+0x53/0x68 [ 54.460803] ? _raw_spin_unlock_irqrestore+0x3a/0x51 [ 54.466347] ? lockdep_hardirqs_on+0x187/0x1a4 [ 54.471310] ? drm_connector_list_iter_next+0x89/0x9a [ 54.476953] ? i915_hotplug_work_func+0x206/0x2be [ 54.482208] ? worker_thread+0x4d5/0x6e2 [ 54.486587] ? worker_thread+0x455/0x6e2 [ 54.490966] ? queue_work_on+0x64/0x64 [ 54.495151] ? kthread+0x1e9/0x1f1 [ 54.498946] ? queue_work_on+0x64/0x64 [ 54.503130] ? kthread_unpark+0x5e/0x5e [ 54.507413] ? ret_from_fork+0x3a/0x50 The proper fix for this is probably cleanup the VCPI allocations when we're enabling the topology, or on the first payload allocation. For now though, let's just revert. Signed-off-by: Lyude Paul <[email protected]> Fixes: 64e62bd ("drm/dp_mst: Remove VCPI while disabling topology mgr") Cc: Sean Paul <[email protected]> Cc: Wayne Lin <[email protected]> Reviewed-by: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 56df70a upstream. find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [[email protected]: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add the conventional braces, replace /** with /*] Fixes: e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yang Shi <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: Markus Elfring <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 55b3209 upstream. For skcipher algorithms, the input, output HW S/G tables look like this: [IV, src][dst, IV] Now, we can have 2 conditions here: - there is no IV; - src and dst are equal (in-place encryption) and scattered and the error is an "off-by-one" in the HW S/G table. This issue was seen with KASAN: BUG: KASAN: slab-out-of-bounds in skcipher_edesc_alloc+0x95c/0x1018 Read of size 4 at addr ffff000022a02958 by task cryptomgr_test/321 CPU: 2 PID: 321 Comm: cryptomgr_test Not tainted 5.6.0-rc1-00165-ge4ef8383-dirty #4 Hardware name: LS1046A RDB Board (DT) Call trace: dump_backtrace+0x0/0x260 show_stack+0x14/0x20 dump_stack+0xe8/0x144 print_address_description.isra.11+0x64/0x348 __kasan_report+0x11c/0x230 kasan_report+0xc/0x18 __asan_load4+0x90/0xb0 skcipher_edesc_alloc+0x95c/0x1018 skcipher_encrypt+0x84/0x150 crypto_skcipher_encrypt+0x50/0x68 test_skcipher_vec_cfg+0x4d4/0xc10 test_skcipher_vec+0x178/0x1d8 alg_test_skcipher+0xec/0x230 alg_test.part.44+0x114/0x4a0 alg_test+0x1c/0x60 cryptomgr_test+0x34/0x58 kthread+0x1b8/0x1c0 ret_from_fork+0x10/0x18 Allocated by task 321: save_stack+0x24/0xb0 __kasan_kmalloc.isra.10+0xc4/0xe0 kasan_kmalloc+0xc/0x18 __kmalloc+0x178/0x2b8 skcipher_edesc_alloc+0x21c/0x1018 skcipher_encrypt+0x84/0x150 crypto_skcipher_encrypt+0x50/0x68 test_skcipher_vec_cfg+0x4d4/0xc10 test_skcipher_vec+0x178/0x1d8 alg_test_skcipher+0xec/0x230 alg_test.part.44+0x114/0x4a0 alg_test+0x1c/0x60 cryptomgr_test+0x34/0x58 kthread+0x1b8/0x1c0 ret_from_fork+0x10/0x18 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff000022a02800 which belongs to the cache dma-kmalloc-512 of size 512 The buggy address is located 344 bytes inside of 512-byte region [ffff000022a02800, ffff000022a02a00) The buggy address belongs to the page: page:fffffe00006a8000 refcount:1 mapcount:0 mapping:ffff00093200c400 index:0x0 compound_mapcount: 0 flags: 0xffff00000010200(slab|head) raw: 0ffff00000010200 dead000000000100 dead000000000122 ffff00093200c400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff000022a02800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff000022a02880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff000022a02900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc ^ ffff000022a02980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff000022a02a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 334d37c ("crypto: caam - update IV using HW support") Cc: <[email protected]> # v5.3+ Signed-off-by: Iuliana Prodan <[email protected]> Reviewed-by: Horia Geantă <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
[ Upstream commit a866759 ] This reverts commit 64e62bd. This commit ends up causing some lockdep splats due to trying to grab the payload lock while holding the mgr's lock: [ 54.010099] [ 54.011765] ====================================================== [ 54.018670] WARNING: possible circular locking dependency detected [ 54.025577] 5.5.0-rc6-02274-g77381c23ee63 #47 Not tainted [ 54.031610] ------------------------------------------------------ [ 54.038516] kworker/1:6/1040 is trying to acquire lock: [ 54.044354] ffff888272af3228 (&mgr->payload_lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.054957] [ 54.054957] but task is already holding lock: [ 54.061473] ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.071193] [ 54.071193] which lock already depends on the new lock. [ 54.071193] [ 54.080334] [ 54.080334] the existing dependency chain (in reverse order) is: [ 54.088697] [ 54.088697] -> #1 (&mgr->lock){+.+.}: [ 54.094440] __mutex_lock+0xc3/0x498 [ 54.099015] drm_dp_mst_topology_get_port_validated+0x25/0x80 [ 54.106018] drm_dp_update_payload_part1+0xa2/0x2e2 [ 54.112051] intel_mst_pre_enable_dp+0x144/0x18f [ 54.117791] intel_encoders_pre_enable+0x63/0x70 [ 54.123532] hsw_crtc_enable+0xa1/0x722 [ 54.128396] intel_update_crtc+0x50/0x194 [ 54.133455] skl_commit_modeset_enables+0x40c/0x540 [ 54.139485] intel_atomic_commit_tail+0x5f7/0x130d [ 54.145418] intel_atomic_commit+0x2c8/0x2d8 [ 54.150770] drm_atomic_helper_set_config+0x5a/0x70 [ 54.156801] drm_mode_setcrtc+0x2ab/0x833 [ 54.161862] drm_ioctl+0x2e5/0x424 [ 54.166242] vfs_ioctl+0x21/0x2f [ 54.170426] do_vfs_ioctl+0x5fb/0x61e [ 54.175096] ksys_ioctl+0x55/0x75 [ 54.179377] __x64_sys_ioctl+0x1a/0x1e [ 54.184146] do_syscall_64+0x5c/0x6d [ 54.188721] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.194946] [ 54.194946] -> #0 (&mgr->payload_lock){+.+.}: [ 54.201463] [ 54.201463] other info that might help us debug this: [ 54.201463] [ 54.210410] Possible unsafe locking scenario: [ 54.210410] [ 54.217025] CPU0 CPU1 [ 54.222082] ---- ---- [ 54.227138] lock(&mgr->lock); [ 54.230643] lock(&mgr->payload_lock); [ 54.237742] lock(&mgr->lock); [ 54.244062] lock(&mgr->payload_lock); [ 54.248346] [ 54.248346] *** DEADLOCK *** [ 54.248346] [ 54.254959] 7 locks held by kworker/1:6/1040: [ 54.259822] #0: ffff888275c4f528 ((wq_completion)events){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.269451] #1: ffffc9000119beb0 ((work_completion)(&(&dev_priv->hotplug.hotplug_work)->work)){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.282768] #2: ffff888272a403f0 (&dev->mode_config.mutex){+.+.}, at: i915_hotplug_work_func+0x4b/0x2be [ 54.293368] #3: ffffffff824fc6c0 (drm_connector_list_iter){.+.+}, at: i915_hotplug_work_func+0x17e/0x2be [ 54.304061] #4: ffffc9000119bc58 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x40/0xfd [ 54.314855] #5: ffff888272a40470 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x74/0xe2 [ 54.324385] #6: ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.334597] [ 54.334597] stack backtrace: [ 54.339464] CPU: 1 PID: 1040 Comm: kworker/1:6 Not tainted 5.5.0-rc6-02274-g77381c23ee63 #47 [ 54.348893] Hardware name: Google Fizz/Fizz, BIOS Google_Fizz.10139.39.0 01/04/2018 [ 54.357451] Workqueue: events i915_hotplug_work_func [ 54.362995] Call Trace: [ 54.365724] dump_stack+0x71/0x9c [ 54.369427] check_noncircular+0x91/0xbc [ 54.373809] ? __lock_acquire+0xc9e/0xf66 [ 54.378286] ? __lock_acquire+0xc9e/0xf66 [ 54.382763] ? lock_acquire+0x175/0x1ac [ 54.387048] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.393177] ? __mutex_lock+0xc3/0x498 [ 54.397362] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.403492] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.409620] ? drm_dp_dpcd_access+0xd9/0x101 [ 54.414390] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.420517] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.426645] ? intel_digital_port_connected+0x34d/0x35c [ 54.432482] ? intel_dp_detect+0x227/0x44e [ 54.437056] ? ww_mutex_lock+0x49/0x9a [ 54.441242] ? drm_helper_probe_detect_ctx+0x75/0xfd [ 54.446789] ? intel_encoder_hotplug+0x4b/0x97 [ 54.451752] ? intel_ddi_hotplug+0x61/0x2e0 [ 54.456423] ? mark_held_locks+0x53/0x68 [ 54.460803] ? _raw_spin_unlock_irqrestore+0x3a/0x51 [ 54.466347] ? lockdep_hardirqs_on+0x187/0x1a4 [ 54.471310] ? drm_connector_list_iter_next+0x89/0x9a [ 54.476953] ? i915_hotplug_work_func+0x206/0x2be [ 54.482208] ? worker_thread+0x4d5/0x6e2 [ 54.486587] ? worker_thread+0x455/0x6e2 [ 54.490966] ? queue_work_on+0x64/0x64 [ 54.495151] ? kthread+0x1e9/0x1f1 [ 54.498946] ? queue_work_on+0x64/0x64 [ 54.503130] ? kthread_unpark+0x5e/0x5e [ 54.507413] ? ret_from_fork+0x3a/0x50 The proper fix for this is probably cleanup the VCPI allocations when we're enabling the topology, or on the first payload allocation. For now though, let's just revert. Signed-off-by: Lyude Paul <[email protected]> Fixes: 64e62bd ("drm/dp_mst: Remove VCPI while disabling topology mgr") Cc: Sean Paul <[email protected]> Cc: Wayne Lin <[email protected]> Reviewed-by: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 9f61419 upstream. Looks like the dma_sync calls don't do what we want on armv7 either. Fixes: Unable to handle kernel paging request at virtual address 50001000 pgd = (ptrval) [50001000] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-00271-g9f159ae07f07 #4 Hardware name: Freescale i.MX53 (Device Tree Support) PC is at v7_dma_clean_range+0x20/0x38 LR is at __dma_page_cpu_to_dev+0x28/0x90 pc : [<c011c76c>] lr : [<c01181c4>] psr: 20000013 sp : d80b5a88 ip : de96c000 fp : d840ce6c r10: 00000000 r9 : 00000001 r8 : d843e010 r7 : 00000000 r6 : 00008000 r5 : ddb6c000 r4 : 00000000 r3 : 0000003f r2 : 00000040 r1 : 50008000 r0 : 50001000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 70004019 DAC: 00000051 Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) Signed-off-by: Rob Clark <[email protected]> Fixes: 3de433c ("drm/msm: Use the correct dma_sync calls in msm_gem") Tested-by: Fabio Estevam <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 56df70a upstream. find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [[email protected]: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add the conventional braces, replace /** with /*] Fixes: e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yang Shi <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: Markus Elfring <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 9f61419 upstream. Looks like the dma_sync calls don't do what we want on armv7 either. Fixes: Unable to handle kernel paging request at virtual address 50001000 pgd = (ptrval) [50001000] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-00271-g9f159ae07f07 #4 Hardware name: Freescale i.MX53 (Device Tree Support) PC is at v7_dma_clean_range+0x20/0x38 LR is at __dma_page_cpu_to_dev+0x28/0x90 pc : [<c011c76c>] lr : [<c01181c4>] psr: 20000013 sp : d80b5a88 ip : de96c000 fp : d840ce6c r10: 00000000 r9 : 00000001 r8 : d843e010 r7 : 00000000 r6 : 00008000 r5 : ddb6c000 r4 : 00000000 r3 : 0000003f r2 : 00000040 r1 : 50008000 r0 : 50001000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 70004019 DAC: 00000051 Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) Signed-off-by: Rob Clark <[email protected]> Fixes: 3de433c ("drm/msm: Use the correct dma_sync calls in msm_gem") Tested-by: Fabio Estevam <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
rn
pushed a commit
that referenced
this issue
May 8, 2020
commit 56df70a upstream. find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [[email protected]: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add the conventional braces, replace /** with /*] Fixes: e86c59b ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yang Shi <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: Markus Elfring <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
as for linux/Documentation/kdump/kdump.txt, 1MB as starting address in statement "This will compile the kernel for physical address 1MB" is right? by context, no any related content is found.
The text was updated successfully, but these errors were encountered: