[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Patch] x86/hpet: Disable interrupts while running hpet interrupt handler.



>>> On 05.08.13 at 22:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> Automated testing on Xen-4.3 testing tip found an interesting issue
> 
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.3.0  x86_64  debug=y  Not tainted ]----

The call trace is suspicious in ways beyond what Keir already
pointed out - with debug=y, there shouldn't be bogus entries listed,
yet ...

> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82c4c01003d0>] __bitmap_and+0/0x3f
> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000000000020   rcx: 0000000000000100
> (XEN) rdx: ffff82c4c032dfc0   rsi: ffff83043f2c6068   rdi: ffff83043f2c6008
> (XEN) rbp: ffff83043f2c6048   rsp: ffff83043f2c6000   r8:  0000000000000001
> (XEN) r9:  0000000000000000   r10: ffff83043f2c76f0   r11: 0000000000000000
> (XEN) r12: ffff83043f2c6008   r13: 7fffffffffffffff   r14: ffff83043f2c6068
> (XEN) r15: 000003343036797b   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000403c40000   cr2: ffff83043f2c5ff8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Valid stack range: ffff83043f2c6000-ffff83043f2c8000, 
> sp=ffff83043f2c6000, tss.esp0=ffff83043f2c7fc0
> (XEN) Xen stack overflow (dumping trace ffff83043f2c6000-ffff83043f2c8000):
[... removed redundant stuff]
> (XEN)    ffff83043f2c6b28: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f
> (XEN)    ffff83043f2c6bf8: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN)    ffff83043f2c6c80: [<ffff82c4c012a535>] 
> _spin_unlock_irqrestore+0x40/0x42
> (XEN)    ffff83043f2c6cb8: [<ffff82c4c01a78d4>] 
> handle_hpet_broadcast+0x5b/0x268
> (XEN)    ffff83043f2c6d28: [<ffff82c4c01a7b41>] 
> hpet_interrupt_handler+0x3e/0x40
> (XEN)    ffff83043f2c6d38: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f
> (XEN)    ffff83043f2c6e08: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN)    ffff83043f2c6e90: [<ffff82c4c012a577>] _spin_unlock_irq+0x40/0x41
> (XEN)    ffff83043f2c6eb8: [<ffff82c4c01704d6>] do_IRQ+0x970/0xa4f
> (XEN)    ffff83043f2c6ed8: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba
> (XEN)    ffff83043f2c6f28: [<ffff82c4c01a7a29>] 
> handle_hpet_broadcast+0x1b0/0x268
> (XEN)    ffff83043f2c6f88: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN)    ffff83043f2c7010: [<ffff82c4c0164f94>] unmap_domain_page+0x6/0x32d
> (XEN)    ffff83043f2c7048: [<ffff82c4c01ef69d>] ept_next_level+0x9c/0xde
> (XEN)    ffff83043f2c7078: [<ffff82c4c01f0ab3>] ept_get_entry+0xb3/0x239
> (XEN)    ffff83043f2c7108: [<ffff82c4c01e9497>] 
> __get_gfn_type_access+0x12b/0x20e
> (XEN)    ffff83043f2c7158: [<ffff82c4c01e9cc2>] 
> get_page_from_gfn_p2m+0xc8/0x25d
> (XEN)    ffff83043f2c71c8: [<ffff82c4c01f4660>] 
> map_domain_gfn_3_levels+0x43/0x13a
> (XEN)    ffff83043f2c7208: [<ffff82c4c01f4b6b>] 
> guest_walk_tables_3_levels+0x414/0x489
> (XEN)    ffff83043f2c7288: [<ffff82c4c0223988>] 
> hap_p2m_ga_to_gfn_3_levels+0x178/0x306
> (XEN)    ffff83043f2c7338: [<ffff82c4c0223b35>] 
> hap_gva_to_gfn_3_levels+0x1f/0x2a
> (XEN)    ffff83043f2c7348: [<ffff82c4c01ebc1e>] paging_gva_to_gfn+0xb6/0xcc
> (XEN)    ffff83043f2c7398: [<ffff82c4c01bedf2>] __hvm_copy+0x57/0x36d
> (XEN)    ffff83043f2c73c8: [<ffff82c4c01b6d34>] 
> hvmemul_virtual_to_linear+0x102/0x153
> (XEN)    ffff83043f2c7408: [<ffff82c4c01c1538>] 
> hvm_copy_from_guest_virt+0x15/0x17
> (XEN)    ffff83043f2c7418: [<ffff82c4c01b7cd3>] __hvmemul_read+0x12d/0x1c8
> (XEN)    ffff83043f2c7498: [<ffff82c4c01b7dc1>] hvmemul_read+0x12/0x14
> (XEN)    ffff83043f2c74a8: [<ffff82c4c01937e9>] read_ulong+0xe/0x10
> (XEN)    ffff83043f2c74b8: [<ffff82c4c0195924>] x86_emulate+0x169d/0x11309

... how would this end up getting called from do_IRQ()?

> (XEN)    ffff83043f2c7558: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f
> (XEN)    ffff83043f2c75c0: [<ffff82c4c012a100>] 
> _spin_trylock_recursive+0x63/0x93
> (XEN)    ffff83043f2c75d8: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f
> (XEN)    ffff83043f2c7618: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba
> (XEN)    ffff83043f2c7668: [<ffff82c4c01a7a29>] 
> handle_hpet_broadcast+0x1b0/0x268
> (XEN)    ffff83043f2c76c8: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN)    ffff83043f2c7788: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN)    ffff83043f2c77b8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239
> (XEN)    ffff83043f2c7848: [<ffff82c4c01775ef>] get_page+0x27/0xf2
> (XEN)    ffff83043f2c7898: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN)    ffff83043f2c78c8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239
> (XEN)    ffff83043f2c7a98: [<ffff82c4c01b7f60>] hvm_emulate_one+0x127/0x1bf
> (XEN)    ffff83043f2c7aa8: [<ffff82c4c01b6c1b>] hvmemul_get_seg_reg+0x49/0x60
> (XEN)    ffff83043f2c7ae8: [<ffff82c4c01c38c5>] handle_mmio+0x55/0x1f0
> (XEN)    ffff83043f2c7b38: [<ffff82c4c0108208>] do_event_channel_op+0/0x10cb

And this one looks bogus too. Question therefore is whether the
problem you describe isn't a consequence of an earlier issue.

> (XEN)    ffff83043f2c7b48: [<ffff82c4c0128bb3>] vcpu_unblock+0x4b/0x4d
> (XEN)    ffff83043f2c7c48: [<ffff82c4c01e9400>] 
> __get_gfn_type_access+0x94/0x20e
> (XEN)    ffff83043f2c7c98: [<ffff82c4c01bccf3>] 
> hvm_hap_nested_page_fault+0x25d/0x456
> (XEN)    ffff83043f2c7d18: [<ffff82c4c01e1257>] 
> vmx_vmexit_handler+0x140a/0x17ba
> (XEN)    ffff83043f2c7d30: [<ffff82c4c01be519>] hvm_do_resume+0x1a/0x1b7
> (XEN)    ffff83043f2c7d60: [<ffff82c4c01dae73>] vmx_do_resume+0x13b/0x15a
> (XEN)    ffff83043f2c7da8: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
> (XEN)    ffff83043f2c7e20: [<ffff82c4c0128091>] schedule+0x82a/0x839
> (XEN)    ffff83043f2c7e50: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
> (XEN)    ffff83043f2c7e68: [<ffff82c4c01cb132>] 
> vlapic_has_pending_irq+0x3f/0x85
> (XEN)    ffff83043f2c7e88: [<ffff82c4c01c50a7>] 
> hvm_vcpu_has_pending_irq+0x9b/0xcd
> (XEN)    ffff83043f2c7ec8: [<ffff82c4c01deca9>] vmx_vmenter_helper+0x60/0x139
> (XEN)    ffff83043f2c7f18: [<ffff82c4c01e7439>] vmx_asm_do_vmentry+0/0xe7
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) DOUBLE FAULT -- system shutdown
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> 
> The hpet interrupt handler runs with interrupts enabled, due to this the
> spin_unlock_irq() in:
> 
>     while ( desc->status & IRQ_PENDING )
>     {
>         desc->status &= ~IRQ_PENDING;
>         spin_unlock_irq(&desc->lock);
>         tsc_in = tb_init_done ? get_cycles() : 0;
>         action->handler(irq, action->dev_id, regs);
>         TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles());
>         spin_lock_irq(&desc->lock);
>     }
> 
> in do_IRQ().
> 
> Clearly there are cases where the frequency of the HPET interrupt is faster
> than the time it takes to process handle_hpet_broadcast(), I presume in part
> because of the large amount of cpumask manipulation.

How many CPUs (and how many usable HPET channels) does the
system have that this crash was observed on?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.