Xen project Mailing List

Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL

To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Mon, 6 Mar 2017 13:52:09 +0000

Accept-language: en-GB, en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Mon, 06 Mar 2017 13:52:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHSlPrsxLdnNRpMYEmwuipv38O8yaGEwbuAgAK58MCAAFrH0A==

Thread-topic: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL

> -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of > Paul Durrant > Sent: 06 March 2017 08:28 > To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; osstest service owner <osstest- > admin@xxxxxxxxxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx> > Subject: Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL > > > -----Original Message----- > > From: Andrew Cooper [mailto:amc96@xxxxxxxxxxxxxxxx] On Behalf Of > > Andrew Cooper > > Sent: 04 March 2017 15:45 > > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > > Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>; xen- > > devel@xxxxxxxxxxxxxxxxxxx; Jan Beulich <JBeulich@xxxxxxxx> > > Subject: Re: [Xen-devel] [xen-unstable test] 106435: regressions - FAIL > > > > On 04/03/2017 15:19, osstest service owner wrote: > > > flight 106435 xen-unstable real [real] > > > http://logs.test-lab.xenproject.org/osstest/logs/106435/ > > > > > > Regressions :-( > > > > > > Tests which did not succeed and are blocking, > > > including tests which could not be run: > > > test-amd64-amd64-xl-qemuu-win7-amd64 9 windows-install fail REGR. > vs. > > 106412 > > > test-armhf-armhf-libvirt-raw 9 debian-di-install fail REGR. vs. > > > 106412 > > > > From > > http://logs.test-lab.xenproject.org/osstest/logs/106435/test-amd64- > amd64- > > xl-qemuu-win7-amd64/9.ts-windows-install.log > > > > Mar 4 12:49:08.069641 (d3) Booting from DVD/CD... > > Mar 4 12:49:08.069670 (d3) Booting from 0000:7c00 > > Mar 4 12:49:08.069697 (XEN) stdvga.c:178:d3v0 leaving stdvga mode > > Mar 4 12:49:13.045676 (XEN) d3: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 > > major: 6 minor: 1 sp: 0 build: 1db0 > > Mar 4 12:49:41.461683 (XEN) d3: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff > > Mar 4 12:49:41.461731 (XEN) d3v0: VIRIDIAN APIC_ASSIST: enabled: 1 pfn: > > 3fffe > > Mar 4 12:49:41.469596 (XEN) domain_crash called from viridian.c:306 > > Mar 4 12:49:41.477595 (XEN) Domain 3 (vcpu#0) crashed on cpu#1: > > Mar 4 12:49:41.477633 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not > > tainted ]---- > > Mar 4 12:49:41.485589 (XEN) CPU: 1 > > Mar 4 12:49:41.485620 (XEN) RIP: 0010:[<fffff80002653479>] > > Mar 4 12:49:41.485650 (XEN) RFLAGS: 0000000000000286 CONTEXT: hvm > > guest (d3v0) > > Mar 4 12:49:41.493591 (XEN) rax: 0000000000000000 rbx: fffff800027ede80 > > rcx: 0000000000000001 > > Mar 4 12:49:41.501560 (XEN) rdx: 0000000000000000 rsi: fffffa8001291040 > > rdi: fffff800027fbc40 > > Mar 4 12:49:41.509588 (XEN) rbp: 0000000000000080 rsp: fffff880009b0d80 > > r8: 0000000000000000 > > Mar 4 12:49:41.517584 (XEN) r9: fffff800027ede80 r10: fffffa8001291040 > > r11: fffff800027ede90 > > Mar 4 12:49:41.517624 (XEN) r12: fffff800008129c0 r13: fffff800028afbe0 > > r14: fffffa800122db30 > > Mar 4 12:49:41.525586 (XEN) r15: fffff80000b96080 cr0: 0000000080050031 > > cr4: 00000000000006b8 > > Mar 4 12:49:41.533582 (XEN) cr3: 0000000000187000 cr2: > 0000000000000000 > > Mar 4 12:49:41.541578 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b > > ss: 0018 > > cs: 0010 > > > > > > Looks like this intermittent bug is still biting. Should we kill > > VIRIDIAN APIC_ASSIST until we have got to the bottom of it? > > Alternatively, add some printk() to actually give us useful information > > when it does go wrong. > > I actually think this may be a bug in Windows 7. I normally run with this > enlightenment on and have seen this very occasionally and only with > Windows 7. > > I think it would probably be useful to stash info about the last interrupt > injected into the guest and then reduce the domain_crash() to a gprintk() > dumping that info plus info about the interrupt that's about to be injected. > I'll > post a patch. Actually I think I may have figured this out... If a buggy version of Windows enables APIC assist (by setting up the shared page) but then performs an EOI without checking/clearing the bit in the page this will result in a domain_crash(). The reasoning is as follows: - Performing the EOI will clear the vector from the ISR but will not clear the pending assist information. - The next call to vlapic_has_pending_irq() will also not clear the pending assist information (because the assist bit is still set), but it will return a valid vector. - The subsequent call to vlapic_ack_pending_irq() will then call viridian_start_apic_assist() which will invoke domain_crash() because it detects that there is a pending assist. The OS is at fault because, by setting up the APIC assist page, it is guaranteeing to the hypervisor this it will check/clear the bit. But it's easy enough to work around the bug by simply aborting any pending APIC assist in the EOI handler. I'll test a patch that does this and post later. Paul > > Paul > > > > > ~Andrew > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > https://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.