[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] Windows 7 64 bit blue screen with stop 1e after restore with new build of win pv drivers


  • To: Fabio Fantoni <fabio.fantoni@xxxxxxx>
  • From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
  • Date: Thu, 20 Nov 2014 14:38:41 +0000
  • Accept-language: en-GB, en-US
  • Cc: "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 20 Nov 2014 14:38:46 +0000
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
  • Thread-index: AQHP/00RmB3jtmoHEUKEz+zGC4lwtpxenLPQ///y4ICAAanfAIAAE64ggAALr4CACNgwAIAAaEgQ
  • Thread-topic: Windows 7 64 bit blue screen with stop 1e after restore with new build of win pv drivers

From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] 
Sent: 20 November 2014 08:59
To: Paul Durrant
Cc: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: Windows 7 64 bit blue screen with stop 1e after restore with new 
build of win pv drivers

Il 14/11/2014 18:54, Fabio Fantoni ha scritto:
2014-11-14 17:15 GMT+01:00 Paul Durrant <Paul.Durrant@xxxxxxxxxx>:
> -----Original Message-----
> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> Sent: 14 November 2014 16:03
> To: Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: Windows 7 64 bit blue screen with stop 1e after restore with
> new build of win pv drivers
>
> Il 13/11/2014 15:38, Fabio Fantoni ha scritto:
> > Il 13/11/2014 15:26, Paul Durrant ha scritto:
> >>> -----Original Message-----
> >>> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> >>> Sent: 13 November 2014 14:20
> >>> To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx; Paul Durrant
> >>> Subject: Windows 7 64 bit blue screen with stop 1e after restore
> >>> with new
> >>> build of win pv drivers
> >>>
> >>> I did a new build of winpv drivers and tested on one windows 7 64 bit
> >>> domU, dom0 xen-unstable with "x86/hvm: Extend HVM cpuid leaf with
> vcpu
> >>> id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches, and qemu 2.2
> >>> from spice git:
> >>> https://github.com/Fantu/Xen/commits/rebase/m2r-staging
> >>>
> >>> After restore windows showed blue screen with stop 1e, I open the
> dump
> >>> with "BlueScreenView" and showed that cause is xennet driver.
> >>> I attached the dump, if you need more informations/tests tell me and
> >>> I'll post them.
> >>>
> >> I've been testing with win7 32-bit and seen no problems. The minidump
> >> doesn't tell me much unfortunately do you have a full dump, plus the
> >> QEMU log?
> >>
> >>Â Â Paul
> >
> > Thanks for reply.
> > qemu log was without errors or warning related, I changed windows
> > option for full memory dump and enable qemu trace of xen and retried
> > to reproduce the problem but this time after restore on same domU not
> > had the blue screen.
>
> Blue screen re-happen, in attachment qemu logs (before and after
> restore) with xen trace.
> Also memory.dmp full this time but there is a problem, is 1,87 gb and my
> connection is max 256kbps of upload :(
> If is really needed tell me and I'll try to load it in my webserver
> tomorrow.
>
I'll try to repro. You're crash happened very early in resume. Is your xenbus 
driver the built from tip?

Yes I updated all to latest git in http://xenbits.xen.org/gitweb/?o=age, all 
except xeniface was updated if I remember good.
Was happen 2 time on total of 5 save/restore I tried.
Latest test was with qemu upstream updated including also the new 2 xen bugfix 
http://secure-web.cisco.com/1iNDFsjwF7XPE_z0Uo-Ub4r--vh4VhmlCy13qJy5SLEV9CVtTrUkzoDeMkNALO9NzvX8kSbTLIg81vb3ill-B5EVSAiKb6S2cEXbjkb6fRKLOQQxT_ZqTY0EJNHmIhCRpZZw9PCWuoQlxws0SAVWf3iRJcXnftDZ-4LNhsvEY02U/http%3A%2F%2Fgit.qemu.org%2Fqemu.git
 (commit 4e70f9271dabc58fbf14680843bfac510c193152).

I have updated xen to latest staging and qemu to tag v2.2.0-rc2 and the crash 
still happen, in attachment minidump of latest crash and below the link of full 
memory dump, hope that these will help you to found and solves the problem:
http://fantu.info/xen/MEMORY.DMP

------

Thanks for the full dump. The crash is very odd. The stack I see is:

1: kd> kb
RetAddr           : Args to Child                                               
            : Call Site
fffff800`02ac35be : 00000000`00000000 00000000`00000000 fffff880`02e68260 
fffff800`02af6a90 : nt!KeBugCheck
fffff800`02af675d : fffff800`02cda380 fffff800`02c17580 fffff800`02a56000 
fffff880`02e68a08 : nt!KiKernelCalloutExceptionHandler+0xe
fffff800`02af5535 : fffff800`02c1b358 fffff880`02e67b98 fffff880`02e68a08 
fffff800`02a56000 : nt!RtlpExecuteHandlerForException+0xd
fffff800`02b064c1 : fffff880`02e68a08 fffff880`02e68260 fffff880`00000000 
00000000`00000000 : nt!RtlDispatchException+0x415
fffff800`02acb242 : fffff880`02e68a08 00000000`00000002 fffff880`02e68ab0 
00000000`00000000 : nt!KiDispatchException+0x135
fffff800`02ac9074 : 00000000`00000001 00000000`00000001 fffffa80`016f7710 
fffff880`00fb69b9 : nt!KiExceptionDispatch+0xc2
fffff880`00fa03a9 : fffff880`00f97978 00000000`00000002 00000000`00000000 
fffff880`00f9f211 : nt!KiBreakpointTrap+0xf4
fffff880`00f97978 : 00000000`00000002 00000000`00000000 fffff880`00f9f211 
fffffa80`0179c010 : xen!HypercallPopulate+0x1269
00000000`00000002 : 00000000`00000000 fffff880`00f9f211 fffffa80`0179c010 
00000000`00000001 : xen+0x1978
00000000`00000000 : fffff880`00f9f211 fffffa80`0179c010 00000000`00000001 
00000000`00000010 : 0x2

And if I dump the code at xen!HypercallPopulate+0x1269, I see 1: kd> u 
xen!HypercallPopulate+0x1269
xen!HypercallPopulate+0x1269:
fffff880`00fa03a9 cc              int     3
fffff880`00fa03aa cc              int     3
fffff880`00fa03ab cc              int     3
fffff880`00fa03ac cc              int     3
fffff880`00fa03ad cc              int     3
fffff880`00fa03ae cc              int     3
fffff880`00fa03af cc              int     3
fffff880`00fa03b0 cc              int     3

So, even if the symbols are wrong then it still means the code appears to have 
jumped into the middle of a load 0xcc bytes, which is what 
hypercall_page_initialise fills the hypercall page with before calling the 
svm/vmx specific code to set it up. Also, I note that this stack is on CPU 1 
which should be stuck spinning with interrupts disabled at this stage. The 
problem is that the loop there is actually making yield hypercalls and doesn't 
take kindly to the hypercall page being re-written under its feet :-( I'd 
missed the 0xcc fill when I'd added the code to re-populate the hypercall page 
and thus believed the contents would simply be re-written with the same bytes 
that were already there, meaning that there's be no problem with the spin code 
in CPU 1+. I should hopefully be able to fix the problem by removing the yield 
hypercalls and replacing them with mm_pause intrinsics.

  Paul
_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.