[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] rcu_sched self-detected stall on CPU on kernel 4.4.5/6 in PV DomU
Just wanted to give a bit of a poke about this.Currently running kernel 4.4.6 in a PV DomU and still occasionally getting hangs. Also stumbled across this that may be related: https://lkml.org/lkml/2016/2/4/724 My latest hang shows: [339844.594001] INFO: rcu_sched self-detected stall on CPU[339844.594001] 1-...: (287557828 ticks this GP) idle=4cb/140000000000001/0 softirq=1340383/1340384 fqs=95372371 [339844.594001] (t=287566692 jiffies g=999283 c=999282 q=1725381) [339844.594001] Task dump for CPU 1:[339844.594001] find R running task 0 2840 2834 0x00000088 [339844.594001] ffffffff818d0c00 ffff88007fd03c58 ffffffff810a625f 0000000000000001 [339844.594001] ffffffff818d0c00 ffff88007fd03c70 ffffffff810a8699 0000000000000002 [339844.594001] ffff88007fd03ca0 ffffffff810d0e5a ffff88007fd170c0 ffffffff818d0c00 [339844.594001] Call Trace: [339844.594001] <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110 [339844.594001] [<ffffffff810a8699>] dump_cpu_task+0x39/0x40 [339844.594001] [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0 [339844.594001] [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0 [339844.594001] [<ffffffff810a91e1>] ? account_system_time+0x81/0x110 [339844.594001] [<ffffffff810a9481>] ? account_process_tick+0x61/0x160 [339844.594001] [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30 [339844.594001] [<ffffffff810d9749>] update_process_times+0x39/0x60[339844.594001] [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50 [339844.594001] [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70 [339844.594001] [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250 [339844.594001] [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190 [339844.594001] [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140 [339844.594001] [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0 [339844.594001] [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50 [339844.594001] [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30[339844.594001] [<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180 [339844.594001] [<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20 [339844.594001] [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80 [339844.594001] [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50 [339844.594001] [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90 [339844.594001] <EOI> [<ffffffff8165bc60>] ? _raw_spin_lock+0x10/0x3 On 2016-03-19 08:46, Steven Haigh wrote: On 19/03/2016 8:40 AM, Steven Haigh wrote:Hi all,So I'd just like to give this a prod. I'm still getting DomU's randomlygo to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as the DomU does not induce these problems.Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't showthese errors. Only in the DomU.Latest crash message from today: INFO: rcu_sched self-detected stall on CPU 0-...: (20869552 ticks this GP) idle=9c9/140000000000001/0 softirq=1440865/1440865 fqs=15068 (t=20874993 jiffies g=1354899 c=1354898 q=798) rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0 s3 ->state=0x0 Task dump for CPU 0: kworker/u4:1 R running task 0 5853 2 0x00000088 Workqueue: writeback wb_workfn (flush-202:0) ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000 ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001 ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00 Call Trace: <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110 [<ffffffff810a8699>] dump_cpu_task+0x39/0x40 [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0 [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0 [<ffffffff810a91e1>] ? account_system_time+0x81/0x110 [<ffffffff810a9481>] ? account_process_tick+0x61/0x160 [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30 [<ffffffff810d9749>] update_process_times+0x39/0x60 [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50 [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70 [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250 [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190 [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140 [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0 [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50 [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30 [<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180 [<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20 [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80 [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50 [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90 <EOI> [<ffffffff810b9ce2>] ? queued_spin_lock_slowpath+0x22/0x170 [<ffffffff8165bc70>] _raw_spin_lock+0x20/0x30 [<ffffffff811f1f34>] writeback_sb_inodes+0x124/0x560 [<ffffffff8165ba36>] ? _raw_spin_unlock_irqrestore+0x16/0x20 [<ffffffff811f23f6>] __writeback_inodes_wb+0x86/0xc0 [<ffffffff811f2606>] wb_writeback+0x1d6/0x2d0 [<ffffffff811f2e94>] wb_workfn+0x284/0x3e0 [<ffffffff81093831>] process_one_work+0x151/0x400 [<ffffffff8109437a>] worker_thread+0x11a/0x460 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0 [<ffffffff810993d9>] kthread+0xc9/0xe0 [<ffffffff81099310>] ? kthread_park+0x60/0x60 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70 [<ffffffff81099310>] ? kthread_park+0x60/0x60 This repeats over and over causing 100% CPU usage - eventually on all vcpus assigned to the DomU and the only recovery is 'xl destroy'. I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in both the DomU and Dom0. On 17/03/2016 8:39 AM, Steven Haigh wrote:Hi all,I've noticed the following problem that ends up with a non-repsonsive PVDomU using kernel 4.4.5 under heavy disk IO: INFO: rcu_sched self-detected stall on CPU 0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0 softirq=3244615/3244615 fqs=4 (t=6762321 jiffies g=2275626 c=2275625 q=54)rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3->state=0x0 Task dump for CPU 0: updatedb R running task 0 6027 6021 0x00000088 ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000 ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001 ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00 Call Trace: <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110 [<ffffffff810a8699>] dump_cpu_task+0x39/0x40 [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0 [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0 [<ffffffff810a91e1>] ? account_system_time+0x81/0x110 [<ffffffff810a9481>] ? account_process_tick+0x61/0x160 [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30 [<ffffffff810d9749>] update_process_times+0x39/0x60 [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50 [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70 [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250 [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190 [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140 [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0 [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50 [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30 [<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180 [<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20 [<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80 [<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50 [<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90 <EOI> [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170 [<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30 [<ffffffff811df221>] find_inode_fast+0x61/0xa0 [<ffffffff811df7ce>] iget_locked+0x6e/0x170 [<ffffffff81249203>] ext4_iget+0x33/0xae0 [<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80 [<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40 [<ffffffff81253415>] ext4_lookup+0xd5/0x140 [<ffffffff811ceffd>] lookup_real+0x1d/0x50 [<ffffffff811cf3b3>] __lookup_hash+0x33/0x40 [<ffffffff811d1af7>] walk_component+0x177/0x280 [<ffffffff811d2c50>] path_lookupat+0x60/0x110 [<ffffffff811d47dc>] filename_lookup+0x9c/0x150 [<ffffffff811a982d>] ? kfree+0x10d/0x290 [<ffffffff8124122c>] ? call_filldir+0x9c/0x130 [<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0 [<ffffffff811d4946>] user_path_at_empty+0x36/0x40 [<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0 [<ffffffff811c7919>] ? __fput+0x169/0x1d0 [<ffffffff811cab42>] SYSC_newlstat+0x22/0x40 [<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270 [<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0 [<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100 [<ffffffff811cad4e>] SyS_newlstat+0xe/0x10 [<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71 This ends up with the system not responding at 100% CPU usage. Has anyone else seen this using kernel 4.4.5 in a DomU? -- Steven Haigh Email: netwiz@xxxxxxxxx Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |