[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL



> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Thursday, January 12, 2017 8:26 PM
> 
> >>> On 12.01.17 at 13:15, <andrew.cooper3@xxxxxxxxxx> wrote:
> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
> >>> On 12/01/2017 06:46, osstest service owner wrote:
> >>>> flight 104131 xen-unstable real [real]
> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking, including tests which
> >>>> could not be run:
> >>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
> >>> REGR. vs. 104119
> >>>
> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' 
> >>> failed at
> >>> intr.c:321
> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
> >>> Not tainted ]----
> >>> Jan 12 01:25:37.141577 (XEN) CPU:    14
> >>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
> >>> hypervisor (d15v0)
> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
> >>> ffff830079e1e000   rcx: 0000000000000030
> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
> >>> 0000000000000030   rdi: ffff830079e1e000
> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
> >>> r8:  ffff82c00022f000
> >>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
> >>> ffff830176386560   r11: 000001955ee79bd0
> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
> >>> 0000000000003002   r14: 0000000000000030
> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
> >>> 0000000080050033   cr4: 00000000003526e0
> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
> >>> 0000000002487034
> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
> >>> ss: 0000   cs: e008
> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
> >>> (vmx_intr_assist+0x35e/0x51d):
> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 
> >>> 51
> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
> >> >from rsp=ffff83047de2fea8:
> >>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
> >>> ffff83047de2ffff ffff83023fec2000
> >>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
> >>> ffff830079e1e000 ffff830079e1e000
> >>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
> >>> ffff830233117000 ffff83023fec2000
> >>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
> >>> 0000000000000004 00000000000000c2
> >>> Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
> >>> ffff8800e8d28000 ffffffff81add0a0
> >>> Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
> >>> ffff880142400008 0000000000000004
> >>> Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
> >>> 00000000000003f8 00000000000003f8
> >>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
> >>> ffffffff813899a4 000000bf0000beef
> >>> Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
> >>> 000000000000beef 1cec835356e5beef
> >>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
> >>> 8914458d3174beef 2444c7100000000e
> >>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
> >>> 00000000003526e0
> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
> >>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
> >>> vmx_asm_vmexit_handler+0x41/0x120
> >>> Jan 12 01:25:37.325598 (XEN)
> >>> Jan 12 01:25:37.325624 (XEN)
> >>> Jan 12 01:25:37.325647 (XEN)
> >>> ****************************************
> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' 
> >>> failed at
> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
> >>> ****************************************
> >>> Jan 12 01:25:37.341603 (XEN)
> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
> >>> RESET_REG.
> >>>
> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  
> >>> It is
> >>> not a deterministic issue, as it appears to have survived a week of 
> >>> testing
> >>> already, but there is clearly something still problematic with the code.
> >>>
> >>
> >> Andrew,
> >> If you have, could you give more information?
> >
> > No further information sorry.  This was found by the automated test system.
> 
> But some can be gathered:
> 
> > Full logs are available from
> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q
> > emuu-debianhvm-amd64/
> > but I doubt any of them will help in diagnosing the issue any further.
> >
> >> Such as the value of intack.vector / pt_vector..
> 
> At leastb one of the two values is likely to live in a register, and
> hence its value would be available in the dump. Just takes looking
> at the disassembly.
> 
> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the 
> >> pt_vector is 'int'..
> 
> That would be odd.
> 
> >> Or there is a corner case that intack.vector is __not__ the highest 
> >> priority vector..
> 
> That's what I'm afraid of, and why I had asked to add the ASSERT().
> 

I cannot come up a valid reason for such situation (intack.vector is 0x30 
while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before
checking highest pending IRRs so pt_vector should be honored anyway. 
One possible reason is that being some reason pt_vector is not in vIRR at 
that point (due to some bug in the path from PIR to vIRR). However I 
didn't catch such bug simply by looking at code. We need reproduce this 
problem in developer side to find out actual reason. Andrew it'd be helpful
if you can help Quan/Chao to find out more test environment info.

One thing noted though. The original patch from Quan is actually orthogonal
to this ASSERT. Regardless of whether intack.vector is larger or smaller
than pt_vector, we always require the trick as long as pt_vector is not the
one being currently programmed to RVI. Then do we want to revert the whole
commit until the problem is finally fixed, or OK to just remove ASSERT 
(or replace with WARN_ON with more debug info) to unblock test system
before the fix is ready?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.