Xen project Mailing List

Re: [Xen-devel] 4.4: INFO: rcu_sched self-detected stall on CPU

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx

From: Steven Haigh <netwiz@xxxxxxxxx>

Date: Sat, 2 Apr 2016 12:50:07 +1100

Delivery-date: Sat, 02 Apr 2016 01:50:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 30/03/2016 1:14 AM, Boris Ostrovsky wrote: > On 03/29/2016 04:56 AM, Steven Haigh wrote: >> >> Interestingly enough, this just happened again - but on a different >> virtual machine. I'm starting to wonder if this may have something to do >> with the uptime of the machine - as the system that this seems to happen >> to is always different. >> >> Destroying it and monitoring it again has so far come up blank. >> >> I've thrown the latest lot of kernel messages here: >> http://paste.fedoraproject.org/346802/59241532 > > Would be good to see full console log. The one that you posted starts > with an error so I wonder what was before that. Ok, so I had a virtual machine do this again today. Both vcpus went to 100% usage and essentially hung. I attached to the screen console that was connected via 'xl console' and copied the entire buffer to paste below: yum-cron[30740]: segfault at 1781ab8 ip 00007f2a7fcd282f sp 00007ffe8655fe90 error 5 in libpython2.7.so.1.0[7f2a7fbf5000+178000] swap_free: Bad swap file entry 2a2b7d5bb69515d8 BUG: Bad page map in process yum-cron pte:56fab76d2a2bb06a pmd:0309e067 addr:0000000001780000 vm_flags:00100073 anon_vma:ffff88007b974c08 mapping: (null) index:1780 file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 0 PID: 30740 Comm: yum-cron Tainted: G B 4.4.6-4.el7xen.x86_64 #1 0000000000000000 ffff88004176bac0 ffffffff81323d17 0000000001780000 00003ffffffff000 ffff88004176bb08 ffffffff8117e574 ffffffff81193d6e 0000000000001780 ffff88000309ec00 0000000001780000 56fab76d2a2bb06a Call Trace: [<ffffffff81323d17>] dump_stack+0x63/0x8c [<ffffffff8117e574>] print_bad_pte+0x1e4/0x290 [<ffffffff81193d6e>] ? swap_info_get+0x7e/0xe0 [<ffffffff8117fdef>] unmap_single_vma+0x4ff/0x840 [<ffffffff81180837>] unmap_vmas+0x47/0x90 [<ffffffff811891f8>] exit_mmap+0x98/0x150 [<ffffffff8107a557>] mmput+0x47/0x100 [<ffffffff8107f81e>] do_exit+0x24e/0xad0 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0 [<ffffffff81017208>] do_signal+0x28/0x630 [<ffffffff81152d7d>] ? printk+0x4d/0x4f [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20 [<ffffffff8165cae5>] retint_user+0x8/0x13 BUG: Bad page state in process yum-cron pfn:0f3bf page:ffffea00003cefc0 count:0 mapcount:7 mapping:ffff88000f3bf008 index:0xffff88000f3bf000 flags: 0x10000000000094(referenced|dirty|slab) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x80(slab) Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 0 PID: 30740 Comm: yum-cron Tainted: G B 4.4.6-4.el7xen.x86_64 #1 0000000000000000 ffff88004176b958 ffffffff81323d17 ffffea00003cefc0 ffffffff817ab348 ffff88004176b980 ffffffff811c1ab5 ffffea00003cefc0 0000000000000001 0000000000000000 ffff88004176b9d0 ffffffff81159584 Call Trace: [<ffffffff81323d17>] dump_stack+0x63/0x8c [<ffffffff811c1ab5>] bad_page.part.69+0xdf/0xfc [<ffffffff81159584>] free_pages_prepare+0x294/0x2a0 [<ffffffff8115b6c1>] free_hot_cold_page+0x31/0x160 [<ffffffff8115b839>] free_hot_cold_page_list+0x49/0xb0 [<ffffffff811621b5>] release_pages+0xc5/0x260 [<ffffffff8119380d>] free_pages_and_swap_cache+0x7d/0x90 [<ffffffff8117ddd6>] tlb_flush_mmu_free+0x36/0x60 [<ffffffff8117ff54>] unmap_single_vma+0x664/0x840 [<ffffffff81180837>] unmap_vmas+0x47/0x90 [<ffffffff811891f8>] exit_mmap+0x98/0x150 [<ffffffff8107a557>] mmput+0x47/0x100 [<ffffffff8107f81e>] do_exit+0x24e/0xad0 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0 [<ffffffff81017208>] do_signal+0x28/0x630 [<ffffffff81152d7d>] ? printk+0x4d/0x4f [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20 [<ffffffff8165cae5>] retint_user+0x8/0x13 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:1 val:2 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:2 val:-1 yum-cron[4197]: segfault at 32947fcb ip 00007ff0fa1bf8bd sp 00007ffdb1c54990 error 4 in libpython2.7.so.1.0[7ff0fa13a000+178000] BUG: unable to handle kernel paging request at ffff88010f3beffe IP: [<ffffffff811a7fb9>] free_block+0x119/0x190 PGD 188b063 PUD 0 Oops: 0002 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G B 4.4.6-4.el7xen.x86_64 #1 Workqueue: events cache_reap task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000 RIP: 0010:[<ffffffff811a7fb9>] [<ffffffff811a7fb9>] free_block+0x119/0x190 RSP: 0018:ffff880051703d40 EFLAGS: 00010082 RAX: ffffea00003cefc0 RBX: ffffea0000000000 RCX: ffff88000f3bf000 RDX: 00000000fffffffe RSI: ffff88007fd19c40 RDI: ffff88007d012100 RBP: ffff880051703d68 R08: ffff880051703d88 R09: 0000000000000006 R10: ffff88007d01a940 R11: 0000000080000000 R12: 000077ff80000000 R13: ffff88007fd19c58 R14: ffff88007d01a948 R15: ffff88007d01a968 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88010f3beffe CR3: 000000007bc8d000 CR4: 0000000000040620 Stack: ffff88007fd19bf0 ffff880051703d88 ffff88007d01a940 000000000000000b ffff88007d012100 ffff880051703dc0 ffffffff811a871b 0000000b51703dd8 ffff88007fd19c00 ffff880051703d88 ffff880051703d88 ffff88007d012100 Call Trace: [<ffffffff811a871b>] drain_array+0xab/0x130 [<ffffffff811a955c>] cache_reap+0x6c/0x230 [<ffffffff81093831>] process_one_work+0x151/0x400 [<ffffffff8109437a>] worker_thread+0x11a/0x460 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0 [<ffffffff810993d9>] kthread+0xc9/0xe0 [<ffffffff81099310>] ? kthread_park+0x60/0x60 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70 [<ffffffff81099310>] ? kthread_park+0x60/0x60 Code: 41 89 d1 4c 0f af c9 0f b6 4f 1c 49 c1 e9 20 44 29 ca d3 ea 0f b6 4f 1d 41 01 d1 8b 50 18 41 d3 e9 48 8b 48 10 83 ea 01 89 50 18 <44> 88 0c 11 49 8b 52 38 48 83 c2 01 49 89 52 38 8b 48 18 85 c9 RIP [<ffffffff811a7fb9>] free_block+0x119/0x190 RSP <ffff880051703d40> CR2: ffff88010f3beffe ---[ end trace 99d22a980c39d4db ]--- BUG: unable to handle kernel paging request at ffffffffffffffd8 IP: [<ffffffff81099b40>] kthread_data+0x10/0x20 PGD 188d067 PUD 188f067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G B D 4.4.6-4.el7xen.x86_64 #1 task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000 RIP: 0010:[<ffffffff81099b40>] [<ffffffff81099b40>] kthread_data+0x10/0x20 RSP: 0018:ffff880051703a60 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 RDX: ffffffff81bd4080 RSI: 0000000000000001 RDI: ffff8800346bf1c0 RBP: ffff880051703a60 R08: ffff88007c8e19d0 R09: 0000c88ed0b2b602 R10: 000000000000001f R11: 0000000000000000 R12: ffff8800346bf1c0 R13: 0000000000000000 R14: 00000000000163c0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 000000007bc8d000 CR4: 0000000000040620 Stack: ffff880051703a78 ffffffff81094741 ffff88007fd163c0 ffff880051703ac0 ffffffff8165858f ffffffff810d2e27 ffff8800346bf1c0 ffff880051704000 ffff8800346bf510 0000000000000046 ffff8800346bf1c0 ffff88007d3facc0 Call Trace: [<ffffffff81094741>] wq_worker_sleeping+0x11/0x90 [<ffffffff8165858f>] __schedule+0x3bf/0x880 [<ffffffff810d2e27>] ? call_rcu_sched+0x17/0x20 [<ffffffff81658a85>] schedule+0x35/0x80 [<ffffffff8107fc29>] do_exit+0x659/0xad0 [<ffffffff8101a60a>] oops_end+0x9a/0xd0 [<ffffffff8105f7ad>] no_context+0x10d/0x360 [<ffffffff8105fb09>] __bad_area_nosemaphore+0x109/0x210 [<ffffffff8105fc23>] bad_area_nosemaphore+0x13/0x20 [<ffffffff81060200>] __do_page_fault+0x80/0x3f0 [<ffffffff81060592>] do_page_fault+0x22/0x30 [<ffffffff8165e0e8>] page_fault+0x28/0x30 [<ffffffff811a7fb9>] ? free_block+0x119/0x190 [<ffffffff811a871b>] drain_array+0xab/0x130 [<ffffffff811a955c>] cache_reap+0x6c/0x230 [<ffffffff81093831>] process_one_work+0x151/0x400 [<ffffffff8109437a>] worker_thread+0x11a/0x460 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0 [<ffffffff810993d9>] kthread+0xc9/0xe0 [<ffffffff81099310>] ? kthread_park+0x60/0x60 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70 [<ffffffff81099310>] ? kthread_park+0x60/0x60 Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 0b 5e 7a 81 e8 e6 34 fe ff e9 d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 RIP [<ffffffff81099b40>] kthread_data+0x10/0x20 RSP <ffff880051703a60> CR2: ffffffffffffffd8 ---[ end trace 99d22a980c39d4dc ]--- Fixing recursive fault but reboot is needed! There is nothing further logged on the console after, and any previous messages are concerning an NFS server. -- Steven Haigh Email: netwiz@xxxxxxxxx Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.