[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] HVM Save/Restore status.



 

> -----Original Message-----
> From: Petersson, Mats 
> Sent: 25 April 2007 16:18
> To: Petersson, Mats; Tim Deegan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Woller, Thomas
> Subject: RE: [Xen-devel] HVM Save/Restore status.
> 
>  
> 
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> > Petersson, Mats
> > Sent: 25 April 2007 15:25
> > To: Tim Deegan
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Woller, Thomas
> > Subject: RE: [Xen-devel] HVM Save/Restore status.
> > 
> >  
> > 
> > > -----Original Message-----
> > > From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxxxxx] 
> > > Sent: 25 April 2007 15:09
> > > To: Petersson, Mats
> > > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Woller, Thomas
> > > Subject: Re: [Xen-devel] HVM Save/Restore status.
> > > 
> > > Hi, 
> > > 
> > > At 12:58 +0200 on 25 Apr (1177505885), Petersson, Mats wrote:
> > > > My "disk-stress" tests have the following status:
> > > > 1. SLES 9.3 using VNC as display has run for over 23 
> > > virtual hours, some
> > > > 40 or so hours since I set off the test without any failures.
> > > 
> > > That's great! 
> > > 
> > > > There's
> > > > one difference between this test and previous ones: I've 
> > > disabled the
> > > > blanking of the screen - there appears to be a problem 
> > > waking the screen
> > > > after some time, not sure why that would be. 
> > > 
> > > What are the symptoms there?  Is the guest still alive?  
> Is qemu-dm
> > > alive?  Does it respond on the network, and just have a 
> > > wedged console?
> > > (Might it be the keyboard + mouse that have got wedged?)
> > 
> > Good question. It turns out (from an attempt to stop the 
> guest nicely
> > when I was going to reboot to have a new Linux-kernel with 
> > debug code in
> > it) that although the guest is still running, I have 
> actually lost at
> > least:
> > - Network. I can't ping the guest or SSH to the guest on the 
> > IP address
> > it used to be when it first got an address from DHCP - 
> presumably, the
> > IP address shouldn't change (it doesn't on other machines 
> that get IP
> > address from the same DHCP server). 
> > - Keyboard. Pressing for example CTRL-C to stop the running 
> > application
> > doesn't work. No other keys appear to have any effect either.
> > 
> > It's unclear to me if any other operations are affected or 
> not. [Time
> > seemed a bit funny too, but that may be my app - I haven't 
> > debugged that
> > yet. It kept cycling around a 2-3 second range around 
> 23h14m18-20s (or
> > some such), where the time comes from "time()" - so perhaps there's
> > something wrong in the "gettimeofday" functionality too.] 
> > 
> > > 
> > > > 2. "Simple-guest" fails to restore on the second restore, 
> > > ending up with
> > > > the guest "killed". Scanning the xend.log, I find "error 
> > > zeroing magic
> > > > pages". Looking further down that path, it seems like it's 
> > > failing to do
> > > > "xc_map_foreign_range"... I'm adding some debug output to try to
> > > > determine where it goes wrong here. 
> > > 
> > > Strange.  Are you doing anything wierd with the ioreq or 
> > > xenstore pages
> > > in the simple guest?  Their PFNs should have been maintained 
> > > across the
> > > first save/restore cycle, and they were mappable the first time...
> > 
> > I'm trying to see what fails and where by printing 
> something at every
> > failure point. So far I've tracked it down to somewhere inside the
> > function direct_remap_pfn_range... Not sure where in this 
> function it
> > goes wrong or where in any of the called functions. As far as 
> > I can see,
> > there's not many things that can go wrong there... 
> 
> Error is 14, which is "EFAULT", which means that the problem 
> appears to be inside the hypercall. 
> 
> I'll see if I can print the different pages involved here. 

So, a few printf later: The first time (which succeds) and the second
time (which fails) is exactly the same frame numbers (1fff, 1ffe, 1ffd).
It fails on the FIRST (I split the "if( ... [0] || ... [1] || ... [2] )"
into separate lines, and print the failure on each with a "[n]" to
indicate which one failed, and it got [0] in the printout. 

--
Mats

> 
> Also, I missed answering the question of what I do with those 
> pages: Nothing. My guest uses about 2MB of the entire 32MB 
> memory range, around 1MB-3MB. 
> 
> --
> Mats
> > 
> > --
> > Mats
> > > 
> > > Cheers,
> > > 
> > > Tim.
> > > 
> > > -- 
> > > Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, XenSource UK Limited
> > > Registered office c/o EC2Y 5EB, UK; company number 05334508
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> > 
> > 
> > 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.