[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL



> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> Paul Durrant
> Sent: 06 March 2017 08:28
> To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; osstest service owner <osstest-
> admin@xxxxxxxxxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx>
> Subject: Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL
> 
> > -----Original Message-----
> > From: Andrew Cooper [mailto:amc96@xxxxxxxxxxxxxxxx] On Behalf Of
> > Andrew Cooper
> > Sent: 04 March 2017 15:45
> > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>; xen-
> > devel@xxxxxxxxxxxxxxxxxxx; Jan Beulich <JBeulich@xxxxxxxx>
> > Subject: Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL
> >
> > On 04/03/2017 15:19, osstest service owner wrote:
> > > flight 106435 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/106435/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-amd64-amd64-xl-qemuu-win7-amd64  9 windows-install  fail REGR.
> vs.
> > 106412
> > >  test-armhf-armhf-libvirt-raw  9 debian-di-install        fail REGR. vs. 
> > > 106412
> >
> > From
> > http://logs.test-lab.xenproject.org/osstest/logs/106435/test-amd64-
> amd64-
> > xl-qemuu-win7-amd64/9.ts-windows-install.log
> >
> > Mar  4 12:49:08.069641 (d3) Booting from DVD/CD...
> > Mar  4 12:49:08.069670 (d3) Booting from 0000:7c00
> > Mar  4 12:49:08.069697 (XEN) stdvga.c:178:d3v0 leaving stdvga mode
> > Mar  4 12:49:13.045676 (XEN) d3: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4
> > major: 6 minor: 1 sp: 0 build: 1db0
> > Mar  4 12:49:41.461683 (XEN) d3: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
> > Mar  4 12:49:41.461731 (XEN) d3v0: VIRIDIAN APIC_ASSIST: enabled: 1 pfn:
> > 3fffe
> > Mar  4 12:49:41.469596 (XEN) domain_crash called from viridian.c:306
> > Mar  4 12:49:41.477595 (XEN) Domain 3 (vcpu#0) crashed on cpu#1:
> > Mar  4 12:49:41.477633 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Not
> > tainted ]----
> > Mar  4 12:49:41.485589 (XEN) CPU:    1
> > Mar  4 12:49:41.485620 (XEN) RIP:    0010:[<fffff80002653479>]
> > Mar  4 12:49:41.485650 (XEN) RFLAGS: 0000000000000286   CONTEXT: hvm
> > guest (d3v0)
> > Mar  4 12:49:41.493591 (XEN) rax: 0000000000000000   rbx: fffff800027ede80
> > rcx: 0000000000000001
> > Mar  4 12:49:41.501560 (XEN) rdx: 0000000000000000   rsi: fffffa8001291040
> > rdi: fffff800027fbc40
> > Mar  4 12:49:41.509588 (XEN) rbp: 0000000000000080   rsp: fffff880009b0d80
> > r8:  0000000000000000
> > Mar  4 12:49:41.517584 (XEN) r9:  fffff800027ede80   r10: fffffa8001291040
> > r11: fffff800027ede90
> > Mar  4 12:49:41.517624 (XEN) r12: fffff800008129c0   r13: fffff800028afbe0
> > r14: fffffa800122db30
> > Mar  4 12:49:41.525586 (XEN) r15: fffff80000b96080   cr0: 0000000080050031
> > cr4: 00000000000006b8
> > Mar  4 12:49:41.533582 (XEN) cr3: 0000000000187000   cr2:
> 0000000000000000
> > Mar  4 12:49:41.541578 (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   
> > ss: 0018
> > cs: 0010
> >
> >
> > Looks like this intermittent bug is still biting.  Should we kill
> > VIRIDIAN APIC_ASSIST until we have got to the bottom of it?
> > Alternatively, add some printk() to actually give us useful information
> > when it does go wrong.
> 
> I actually think this may be a bug in Windows 7. I normally run with this
> enlightenment on and have seen this very occasionally and only with
> Windows 7.
> 
> I think it would probably be useful to stash info about the last interrupt
> injected into the guest and then reduce the domain_crash() to a gprintk()
> dumping that info plus info about the interrupt that's about to be injected. 
> I'll
> post a patch.

Actually I think I may have figured this out...

If a buggy version of Windows enables APIC assist (by setting up the shared 
page) but then performs an EOI without checking/clearing the bit in the page 
this will result in a domain_crash(). The reasoning is as follows:

    - Performing the EOI will clear the vector from the ISR but will not clear 
the pending assist information.
    - The next call to vlapic_has_pending_irq() will also not clear the pending 
assist information (because the assist bit is still set), but it will return a 
valid vector.
    - The subsequent call to vlapic_ack_pending_irq() will then call 
viridian_start_apic_assist() which will invoke domain_crash() because it 
detects that there is a pending assist.

The OS is at fault because, by setting up the APIC assist page, it is 
guaranteeing to the hypervisor this it will check/clear the bit. But it's easy 
enough to work around the bug by simply aborting any pending APIC assist in the 
EOI handler. I'll test a patch that does this and post later.

  Paul

> 
>   Paul
> 
> >
> > ~Andrew
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.