[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.
On Wed, Oct 04, 2017 at 08:26:27PM +0200, Philipp Hahn wrote: > Hello, > Adding Ankur to this as I think he saw something similar. But in the meantime - do you see this with the latest version of Linux? > with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed > several stuck processes: Here is one (truncated) dump of the Linux > kernel messages: > > > [<ffffffff8160db6d>] ? __schedule+0x23d/0x6d0 > > [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160e032>] ? schedule+0x32/0x80 > > [<ffffffff8161150c>] ? schedule_timeout+0x1ec/0x360 > > [<ffffffff8130f277>] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below > > [<ffffffff8101b7f1>] ? xen_clocksource_get_cycles+0x11/0x20 > > [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160d8b4>] ? io_schedule_timeout+0xb4/0x130 > > [<ffffffff810bb6f7>] ? prepare_to_wait+0x57/0x80 > > [<ffffffff8160e8b7>] ? bit_wait_io+0x17/0x60 > > [<ffffffff8160e3ac>] ? __wait_on_bit+0x5c/0x90 > > [<ffffffff8160e8a0>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160e50e>] ? out_of_line_wait_on_bit+0x7e/0xa0 > > [<ffffffff810bba20>] ? autoremove_wake_function+0x40/0x40 > > [<ffffffffc00abd18>] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2] > > [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 > > [<ffffffff810e67fd>] ? try_to_del_timer_sync+0x4d/0x80 > > [<ffffffffc00b099d>] ? kjournald2+0xdd/0x280 [jbd2] > > [<ffffffff810bb9e0>] ? wake_up_atomic_t+0x30/0x30 > > [<ffffffffc00b08c0>] ? commit_timeout+0x10/0x10 [jbd2] > > [<ffffffff81097660>] ? kthread+0xf0/0x110 > > [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 > > [<ffffffff81097570>] ? kthread_park+0x60/0x60 > > [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30 > > NMI backtrace for cpu 2 > > CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian > > 4.9.30-2A~4.2.0.201709271649 > > 0000000000000000 ffffffff81331935 0000000000000000 0000000000000002 > > ffffffff81335e60 0000000000000002 ffffffff8104cb70 ffff8801f0c90e80 > > ffffffff81335f6a ffff8801f0c90e80 00000000003fffbc ffffffff81128048 > > Call Trace: > > [<ffffffff81331935>] ? dump_stack+0x5c/0x77 > > [<ffffffff81335e60>] ? nmi_cpu_backtrace+0x90/0xa0 > > [<ffffffff8104cb70>] ? irq_force_complete_move+0x140/0x140 > > [<ffffffff81335f6a>] ? nmi_trigger_cpumask_backtrace+0xfa/0x130 > > [<ffffffff81128048>] ? watchdog+0x2b8/0x330 > > [<ffffffff81127d90>] ? reset_hung_task_detector+0x10/0x10 > > [<ffffffff81097660>] ? kthread+0xf0/0x110 > > [<ffffffff810247d9>] ? __switch_to+0x2c9/0x720 > > [<ffffffff81097570>] ? kthread_park+0x60/0x60 > > [<ffffffff81612bb5>] ? ret_from_fork+0x25/0x30 > > Sending NMI from CPU 2 to CPUs 0-1,3: > > NMI backtrace for cpu 1 > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian > > 4.9.30-2A~4.2.0.201709271649 > > task: ffff8801f4a02ec0 task.stack: ffffc90040ca4000 > > RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] > > xen_hypercall_sched_op+0xa/0x20 > > RSP: e02b:ffffc90040ca7ed0 EFLAGS: 00000246 > > RAX: 0000000000000000 RBX: ffff8801f4a02ec0 RCX: ffffffff810013aa > > RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 > > RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 > > R13: 0000000000000000 R14: ffff8801f4a02ec0 R15: ffff8801f4a02ec0 > > FS: 00007f23ac595700(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000 > > CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: 00007f52537d6d46 CR3: 00000001bba23000 CR4: 0000000000002660 > > Stack: > > ffff8801bb832201 0000000000000001 ffffffff8101b55c ffffffff81611ec8 > > ffff8801f4a02ec0 0000000000000001 ffffffff810bc280 ffff8801f4a02ec0 > > ffff8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 0000000000000000 > > Call Trace: > > [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 > > [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 > > [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 > > Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc > > cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 > > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > > NMI backtrace for cpu 3 > > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian > > 4.9.30-2A~4.2.0.201709271649 > > task: ffff8801f4a24f00 task.stack: ffffc90040cb4000 > > RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] > > xen_hypercall_sched_op+0xa/0x20 > > RSP: e02b:ffffc90040cb7ed0 EFLAGS: 00000246 > > RAX: 0000000000000000 RBX: ffff8801f4a24f00 RCX: ffffffff810013aa > > RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 > > RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 > > R13: 0000000000000000 R14: ffff8801f4a24f00 R15: ffff8801f4a24f00 > > FS: 00007f1a2af19700(0000) GS:ffff8801f5b80000(0000) knlGS:0000000000000000 > > CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: 00007f4a5416b000 CR3: 00000001d83ec000 CR4: 0000000000002660 > > Stack: > > 0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8 > > ffff8801f4a24f00 0000000000000003 ffffffff810bc280 ffff8801f4a24f00 > > ffff8801f4a24f00 77816deb133b9979 addce6dcadd009c9 0000000000000000 > > Call Trace: > > [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 > > [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 > > [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 > > Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc > > cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 > > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > > NMI backtrace for cpu 0 > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-ucs105-amd64 #1 Debian > > 4.9.30-2A~4.2.0.201709271649 > > task: ffffffff81c0e540 task.stack: ffffffff81c00000 > > RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] > > xen_hypercall_sched_op+0xa/0x20 > > RSP: e02b:ffffffff81c03e90 EFLAGS: 00000246 > > RAX: 0000000000000000 RBX: ffffffff81c0e540 RCX: ffffffff810013aa > > RDX: ffffffff81c4ba70 RSI: 0000000000000000 RDI: 0000000000000001 > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000 > > R13: 0000000000000000 R14: ffffffff81c0e540 R15: ffffffff81c0e540 > > FS: 00007f977d3c2700(0000) GS:ffff8801f5a00000(0000) knlGS:0000000000000000 > > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000055ae9becabc8 CR3: 00000001d6a40000 CR4: 0000000000002660 > > Stack: > > 0000000000000001 0000000000000001 ffffffff8101b55c ffffffff81611ec8 > > ffffffff81c0e540 0000000000000000 ffffffff810bc280 ffffffff81c0e540 > > ffffffff81c0e540 ebeac214e330bd3f addce6dcadd009c9 ffffffffffffffff > > Call Trace: > > [<ffffffff8101b55c>] ? xen_safe_halt+0xc/0x20 > > [<ffffffff81611ec8>] ? default_idle+0x18/0xd0 > > [<ffffffff810bc280>] ? cpu_startup_entry+0x1f0/0x260 > > [<ffffffff81d4bf84>] ? start_kernel+0x46d/0x48d > > [<ffffffff81d510e6>] ? xen_start_kernel+0x52e/0x538 > > Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc > > cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 > > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > > > Looking at the dis-assembly of xen_clocksource_get_cycles() in > arch/x86/xen/time.c I see no path how that should call > __blk_mq_run_hw_queue(): > > > 00000000000001a0 <xen_clocksource_get_cycles> mov %gs:0x0(%rip),%rdi > > # 00000000000001a8 <xen_clocksource_get_cycles+0x8> > > 00000000000001a8 <xen_clocksource_get_cycles+0x8> add $0x20,%rdi > > ret = pvclock_clocksource_read(src); > > 00000000000001ac <xen_clocksource_get_cycles+0xc> callq 00000000000001b1 > > <xen_clocksource_get_cycles+0x11> > > } > > > > static cycle_t xen_clocksource_get_cycles(struct clocksource *cs) > > { > > return xen_clocksource_read(); > > } > > 00000000000001b1 <xen_clocksource_get_cycles+0x11> retq > > 00000000000001b2 <xen_clocksource_get_cycles+0x12> data16 data16 data16 > > data16 nopw %cs:0x0(%rax,%rax,1) > > > > static void xen_read_wallclock(struct timespec *ts) > > { > > struct shared_info *s = HYPERVISOR_shared_info; > > struct pvclock_wall_clock *wall_clock = &(s->wc); > > 00000000000001c0 <xen_get_wallclock> mov 0x0(%rip),%rax # > > 00000000000001c7 <xen_get_wallclock+0x7> > > > Here's another dump, which diverges from xen_clocksource_get_cycles() to > some completely other function: > > > INFO: task btrfs-transacti:522 blocked for more than 300 seconds. > > btrfs-transacti D 0 522 2 0x00000000 > > ffff8801f3836000 0000000000000000 ffff8801f4a24f00 ffff8801f32da1c0 > > ffff8801f5b18780 ffffc9004199fa40 ffffffff8160cd2d 0000000000000000 > > ffff8801741a58a8 0000000000000102 000000000000000e ffff8801f32da1c0 > > Call Trace: > > [<ffffffff8160cd2d>] ? __schedule+0x23d/0x6d0 > > [<ffffffff8160da60>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160d1f2>] ? schedule+0x32/0x80 > > [<ffffffff81610729>] ? schedule_timeout+0x249/0x300 > *** > > [<ffffffff8101b7d1>] ? xen_clocksource_get_cycles+0x11/0x20 > > [<ffffffff8160da60>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160ca74>] ? io_schedule_timeout+0xb4/0x130 > > [<ffffffff810bb5e7>] ? prepare_to_wait+0x57/0x80 > > [<ffffffff8160da77>] ? bit_wait_io+0x17/0x60 > > [<ffffffff8160d56c>] ? __wait_on_bit+0x5c/0x90 > > [<ffffffff8117e18e>] ? find_get_pages_tag+0x15e/0x300 > > [<ffffffff8117d016>] ? wait_on_page_bit+0x86/0xa0 > > [<ffffffff810bb910>] ? autoremove_wake_function+0x40/0x40 > > [<ffffffff8117d107>] ? __filemap_fdatawait_range+0xd7/0x150 > > [<ffffffff8117d18f>] ? filemap_fdatawait_range+0xf/0x30 > > [<ffffffffc0175643>] ? btrfs_wait_ordered_range+0x73/0x110 [btrfs] > > [<ffffffffc01a2a0d>] ? btrfs_wait_cache_io+0x5d/0x1f0 [btrfs] > > [<ffffffffc0141a36>] ? btrfs_write_dirty_block_groups+0x106/0x380 [btrfs] > > [<ffffffffc0140c5d>] ? btrfs_run_delayed_refs+0x1fd/0x2b0 [btrfs] > > [<ffffffffc01551d7>] ? commit_cowonly_roots+0x257/0x2f0 [btrfs] > > [<ffffffffc0157a24>] ? btrfs_commit_transaction+0x4e4/0xa40 [btrfs] > > [<ffffffffc015801d>] ? start_transaction+0x9d/0x4a0 [btrfs] > > [<ffffffffc01523a2>] ? transaction_kthread+0x1b2/0x1f0 [btrfs] > > [<ffffffffc01521f0>] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] > > [<ffffffff810975c0>] ? kthread+0xf0/0x110 > > [<ffffffff8102476b>] ? __switch_to+0x2bb/0x700 > > [<ffffffff810974d0>] ? kthread_park+0x60/0x60 > > [<ffffffff81611d35>] ? ret_from_fork+0x25/0x30 > > And another one: > > INFO: task smbd:20101 blocked for more than 300 seconds. > > smbd D 0 20101 1714 0x00000000 > > ffff8801f01cac00 0000000000000000 ffff8801f4a241c0 ffff88007f9a5240 > > ffff8801f5b98780 ffffc90049a33bc0 ffffffff8160cd2d ffff8800f78bb5b0 > > fd2236313fb5274b 000000008dc3c1a0 ffff8801f3ef0a40 ffff88007f9a5240 > > Call Trace: > > [<ffffffff8160cd2d>] ? __schedule+0x23d/0x6d0 > > [<ffffffff8160da60>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160d1f2>] ? schedule+0x32/0x80 > > [<ffffffff81610729>] ? schedule_timeout+0x249/0x300 > > [<ffffffff81336486>] ? __radix_tree_lookup+0x76/0xe0 > *** > > [<ffffffff8101b7d1>] ? xen_clocksource_get_cycles+0x11/0x20 > > [<ffffffff8160da60>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160ca74>] ? io_schedule_timeout+0xb4/0x130 > > [<ffffffff810bb5e7>] ? prepare_to_wait+0x57/0x80 > > [<ffffffff8160da77>] ? bit_wait_io+0x17/0x60 > > [<ffffffff8160d56c>] ? __wait_on_bit+0x5c/0x90 > > [<ffffffffc00f290b>] ? __jbd2_journal_file_buffer+0xcb/0x180 [jbd2] > > [<ffffffff8160da60>] ? bit_wait_timeout+0x90/0x90 > > [<ffffffff8160d6ce>] ? out_of_line_wait_on_bit+0x7e/0xa0 > > [<ffffffff810bb910>] ? autoremove_wake_function+0x40/0x40 > > [<ffffffffc00f2bc8>] ? do_get_write_access+0x208/0x420 [jbd2] > > [<ffffffffc00f2e0e>] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2] > > [<ffffffffc02c78a6>] ? __ext4_journal_get_write_access+0x36/0x70 [ext4] > > [<ffffffffc02a3913>] ? ext4_orphan_add+0xd3/0x230 [ext4] > > [<ffffffffc0296b4a>] ? ext4_mark_inode_dirty+0x6a/0x200 [ext4] > > [<ffffffffc02a4b8a>] ? ext4_unlink+0x36a/0x380 [ext4] > > [<ffffffff81211677>] ? vfs_unlink+0xe7/0x180 > > [<ffffffff81214439>] ? do_unlinkat+0x289/0x300 > > [<ffffffff81611abb>] ? system_call_fast_compare_end+0xc/0x9b > > This does not look normal to me or did I miss something? > > Where can I get more information on why there is no progress for 300s, > what should I do to debug which task is waiting for what? > > The traces of the of other CPUs look normal to me: the one posted first > above is the shortest, in all other cases they were sooner or later > waiting for IO (my interpretation, but I can post them if necessary.) > > This problem occurs since the upgrade of the Linux kernel inside the VM > from 4.1.x to 4.9.32 and now 4.9.52. > > Any help is appreciated. > Philipp Hahn > -- > Philipp Hahn > Open Source Software Engineer > > Univention GmbH > be open. > Mary-Somerville-Str. 1 > D-28359 Bremen > Tel.: +49 421 22232-0 > Fax : +49 421 22232-99 > hahn@xxxxxxxxxxxxx > > http://www.univention.de/ > Geschäftsführer: Peter H. Ganten > HRB 20755 Amtsgericht Bremen > Steuer-Nr.: 71-597-02876 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > https://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |