[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-unstable test] 21375: regressions - FAIL



On Tue, Nov 19, 2013 at 11:07:22AM +0000, Ian Campbell wrote:
> On Mon, 2013-11-18 at 17:18 +0000, Anthony PERARD wrote:
> > On Wed, Nov 06, 2013 at 05:22:29PM +0000, Anthony PERARD wrote:
> > > On Fri, Nov 01, 2013 at 03:46:36PM +0000, Anthony PERARD wrote:
> > > > On Fri, Nov 01, 2013 at 12:06:51PM +0000, Ian Campbell wrote:
> > > > > On Fri, 2013-11-01 at 11:58 +0000, Anthony PERARD wrote:
> > > > > > On Fri, Nov 01, 2013 at 10:43:16AM +0000, Ian Campbell wrote:
> > > > > > > On Fri, 2013-11-01 at 10:38 +0000, xen.org wrote:
> > > > > > > > flight 21375 qemu-upstream-unstable real [real]
> > > > > > > > http://www.chiark.greenend.org.uk/~xensrcts/logs/21375/
> > > > > > > > 
> > > > > > > > Regressions :-(
> > > > > > > > 
> > > > > > > > Tests which did not succeed and are blocking,
> > > > > > > > including tests which could not be run:
> > > > > > > >  test-amd64-i386-qemuu-rhel6hvm-intel  7 redhat-install    fail 
> > > > > > > > REGR. vs. 20054
> > > > > > > 
> > > > > > > Anythony, have you made any progress on this? It's been failing 
> > > > > > > for ages
> > > > > > > now...
> > > > > > 
> > > > > > Yes, looks like the bug it trigger during a vesa resolution change. 
> > > > > > I
> > > > > > have try to use the vgabios blob that we use for qemu-traditionnal 
> > > > > > and
> > > > > > it works fine. But with the vgabios blob provided by qemu, it does 
> > > > > > not
> > > > > > work... I'm still not sure of what the bug is, but I'm getting 
> > > > > > closer to
> > > > > > it.
> > > > > 
> > > > > Yay!
> > > > > 
> > > > > > Also, this happen only on an Intel machine, on an AMD machine,
> > > > > > everything works like a charm.
> > > > > > 
> > > > > > More detail, if anyone want to know:
> > > > > > It's look like syslinux is doing a int 10h call that never return 
> > > > > > to set
> > > > > > video mode:
> > > > > > Int 0x10, with AX=0x4F02
> > > > > 
> > > > > This looks like it might be handled by SeaBIOS 
> > > > > vgasrc/vbe.c:vbe_104f00 ?
> > > > > There seem to be a few changes in upstream seabios since the version
> > > > > referenced in xen.git:Config.mk. Many of them are cleanups/code motion
> > > > > but a few look worth investigating. 
> > > > 
> > > > I've been able to get the things working by applying a patch to vgabios
> > > > that is in xen tree: a0e7ccf6864c196906d58b54cd0996b4dbc1b022
> > > > This patch allow to clear the framebuffer much faster.
> > > > 
> > > > But it those not really help be to understand why the guest freeze. A
> > > > couple more printf might.
> > > 
> > > I finally managed to have a better understanding of the issue.
> > > 
> > > So, the vgabios blob provided by QEMU have a routine to clear the video
> > > ram that take few seconds to run. That give enough time to QEMU to try
> > > to refresh is display, and this mean they will be a call to
> > > xc_hvm_track_dirty_vram(). If the function is called while the vgabios
> > > routine is running, then the guest is lost.
> > > 
> > > The issue appear only with an Intel machine on an HVM guest using EPT.
> > > Having the guest using shadow works fine. So I'm going to investigate
> > > the track_dirty code in Xen.
> > > 
> > > The vgabios routine is called by syslinux with an Int 0x10, I tryied to
> > > get some debug print after the call, either from the guest serial or
> > > by using the Xen debug ioport, nothing ever appear, and gdbsx only gave
> > > me some weird IP which does not appear to point to any usefull code
> > > (it's all zeros).
> > 
> > An other update,
> > 
> > we had the idee of trying this on earlier versin of Xen, and it turns
> > out that Xen 4.3 works fine. One bisect later, and a commit turns out.
> > 
> > commit 86781624f8df1d50eb4185cfc2ddce926798f7aa
> > x86_emulate: PUSH <mem> must read source operand just once
> > ... for the case of accessing MMIO.
> > 
> > So after this commit, syslinux stop working correctly with the last
> > version of QEMU. This happen if QEMU is calling track_dirty_vram.
> > 
> > I also have use xentrace/xenalyze to try to grab more information about
> > the issue, it did not really help, but it's tell me that the guest is
> > stock on a specific instruction (it result in vmexit EPT_VIOLATION over
> > and over on xentrace). And that were the guest is stock:
> > 
> >    0xa126:  mov    %eax,%cr0
> >    0xa129:  ljmp   $0xf2e,$0xa12e
> >    0xa130:  mov    $0x26,%dl
> >    0xa132:  or     %bh,(%eax)
> >    0xa134:  movzww %sp,%sp
> >    0xa138:  mov    %edx,%ds
> >    0xa13a:  mov    %edx,%es
> >    0xa13c:  mov    %edx,%fs
> >    0xa13e:  mov    %edx,%gs
> >    0xa140:  jmp    *%ebx
> >    0xa142:  pushf  
> > => 0xa143:  lcall  *%cs:(%si)
> >    0xa147:  mov    $0x0,%ch
> 
> OOI what is the encoding of the bad instruction?

That's what gdb give me:
   0x0000a143:  2e 67 ff 1c   lcall  *%cs:(%si)

> > Before trying on earlier version of Xen, I try to understand what when
> > wrong on the Xen side, it turn out that, in the track_dirty_vram
> > hypercall, a call to hap_enable_log_dirty() is all that needed to break
> > the guest.
> > 
> > Jan, any idee of what the issue is?
> > 
> > Regards,
> > 
> 
> 

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.