[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: PoD issue



On Fri, Feb 19, 2010 at 1:53 AM, Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx> wrote:
>> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
>> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>> > Yeah, the OSS tree doesn't get the kind of regression testing it
>> > really needs at the moment.  I was using the OSS balloon drivers when
>> > I implemented and submitted the PoD code last year.  I didn't have any
>> > trouble then, and I was definitely using up all of the memory.  But I
>> > haven't done any testing on OSS since then, basically.
>> >
>>
>> Is it expected that booting HVM guests with maxmem > memory is
>> unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
>> crash the guest and occasionally the entire server.
>
> Obviously the platform should never crash, and that's very concerning.
>
> Are you running a balloon driver in the guest? It's essential that you do, 
> because it needs to get in fairly early in the guest boot and allocate the 
> difference between maxmem and target memory. The populate-on-demand code 
> exists just to cope with things like the memory scrubber running ahead of the 
> balloon driver. If you're not running a balloon driver the guest is doomed to 
> crash as soon as it tries using more than target memory.
>
> All of this requires coordination between the tool stack, PoD code, and PV 
> drivers so that sufficient memory gets ballooned out. I expect the 
> combination that has had most testing is the XCP toolstack and Citrix PV 
> windows drivers.
>

Initially I was using the XCP 0.1.1 WinPV drivers (win server 2003
sp2) and the guest crashed when I tried to install software via
emulated cdrom. Nothing about the crash was reported in the qemu log
file and xend.log wasn't very helpful either but here's the relevant
portion:
[2010-02-17 20:42:49 4253] DEBUG (DevController:139) Waiting for devices vtpm.
[2010-02-17 20:42:49 4253] INFO (XendDomain:1182) Domain win2 (30) unpaused.
[2010-02-17 20:48:05 4253] WARNING (XendDomainInfo:1888) Domain has
crashed: name=win2 id=30.
[2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2734)
XendDomainInfo.destroy: domid=30
[2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2209) Destroying device model

I unsuccessfully attempted the install several more times then tried
copying files from the emulated cd which also crashed the guest each
time. I wasn't even thinking about the fact that I had set maxmem/pod
so I blamed the xcp winpv drivers and switched to gplpv (0.10.0.138).
Same crashes with gplpv. At this point I hadn't checked 'xm dmesg'
which was the only place that the pod/p2m error is reported so I
changed to pure HVM mode and tried to copy the files from emulated cd.
That's when the real trouble started.

The rdp and vnc connections to the guest froze as did the ssh to the
dom0. This server was also hosting 7 linux pv guests. I could ping the
guests and partially load some of their websites but couldn't login
via ssh. I suspeced that the HDDs were overloaded causing disk io to
block the guests. I was on site so I went to check server and was
shocked to find no disk activity. The monitor output was blank and I
couldnt wake it up. Maybe the usb keyboard was unable to be enumerated
because I couldnt even toggle the numlock, etc after several
reconnections.

I power cycled the host and checked the logs but there was no evidence
of a crash other than one of the software raid devices being unclean
on startup. Perhaps there was interesting data logged to 'xm dmesg' or
waiting to be written to disk at the time of the crash. I'm afraid
this server/mb is incapable of logging data to the serial port. I've
attempted to do so several times both before and after this crash.

Of course the simple fix is to remove maxmem from the domU config file
for the time being. Eventually people will use pod on production
systems. Relying on the guest to have a solid balloon driver is
unacceptable. A guest could accidentally (or otherwise) remove the pv
drivers to bring down an entire host.

When I can free up a server with serial logging for testing I will try
to reproduce this crash.


Keith Coleman

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.