[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] xen save error



BTW, back when I was using printk's in the user kernel to determine if it was 
getting the message to suspend or not, I found it really odd that I could 
remove the line in the kernel that responds to the receipt of the suspend 
command (keeping the one that says "I have suspended") and it actually works 
better - usually going 12 to 20 migrations between failures.  I was somewhat 
surprised that removing the response that the message was receive would work.  
I would have figured something would have been waiting on the receipt of that 
message.  This makes me wonder if there is some timing issue going on where the 
kernel is told to suspend and then the domain stops getting CPU time before it 
is able to complete suspending.

-- Ray

-----Original Message-----
From: Cole, Ray 
Sent: Thursday, September 15, 2005 1:29 PM
To: 'Ian Pratt'; bryant.johan@xxxxxxxxx
Cc: xen-users@xxxxxxxxxxxxxxxxxxx; ian.pratt@xxxxxxxxxxxx
Subject: RE: [Xen-users] xen save error


Sure.

I've got 2 machines where I've installed 2.0-testing from 
xen-2.0-testing-install.tar.gz.  I downloaded it this morning to make sure I 
had the latest.  Ran install.sh.

I made sure grub is pointing to xen-2.0.gz, which is in turn a symbolic link to 
xen-2.0-testing.gz.  I did a depmod for 2.6.12-xen0 and xenU and created 
initrd's for both.  Also modified grub to use the 2.6.12-xen0 kernel with it 
and rebooted.  uname -a confirmed I'm using 2.6.12-xen0.  Domain 0 was 
orignally a RedHat AS 4.0 installation ('minimal installation' selected).

I then copied a Fedora Core 4 installation image to an NFS mount location.  
Also created a swap file (have tried with and without) on the NFS link. I 
created the .cfg file for the domain - nothing special about it.  /Domains/t is 
where the NFS mount is made.  The .cfg is later in this email.

The FC4 image has had /lib/tls renamed to tls.disabled, although I still get a 
warning when booting the user domain that I've got /lib/tls.  I don't know, 
maybe the initrd has it.

Anyway...from I started xend/xfrd.  I start up the FC4 domain using 
2.6.12-xenU.  This domain uses autofs extensively (all /home entries are 
automounted) and NIS.  I log in to the user domain using a remote xterm (ssh 
into the domain, start xterm).  I then start 'top' so I can see that the domain 
is still alive.

I then do:

  xm migrate --live rayfed4 {new_machine}

back and forth between the two machines that have identical Xen 2.0 Testing 
installations.  I can generally go back and forth about 4 or 5 times before one 
of the migrate commands tells me it had an error (can't suspend).  I had at one 
time put printk's into the user kernel (after downloading the 2.0 testing 
source, of course..) and confirmed that the kernel receives the message to 
suspend, but the suspend work the kernel schedules never gets executed.  I wait 
about 10 seconds between migration attempts.

Below is my .cfg:

#  -*- mode: python; -*-
#============================================================================
# Python configuration setup for 'xm create'.
# This script sets the parameters used when a domain is created using 'xm 
create'.
# You use a separate script for each domain you want to create, or 
# you can set the parameters for the domain on the xm command line.
#============================================================================

#----------------------------------------------------------------------------
# Kernel image file.
kernel = "/boot/vmlinuz-2.6.12-xenU"

# Optional ramdisk.
ramdisk = "/boot/initrd-2.6.12-xenU.img"

# The domain build function. Default is 'linux'.
#builder='linux'

# Initial memory allocation (in megabytes) for the new domain.
memory = 192

# A name for your domain. All domains must have different names.
name = "rayfed4"

# Which CPU to start domain on? 
#cpu = -1   # leave to Xen to pick

#----------------------------------------------------------------------------
# Define network interfaces.

# Number of network interfaces. Default is 1.
#nics=1

# Optionally define mac and/or bridge for the network interfaces.
# Random MACs are assigned if not given.
#vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0' ]
vif = [ 'mac=52:54:00:12:34:56' ]

#----------------------------------------------------------------------------
# Define the disk devices you want the domain to have access to, and
# what you want them accessible as.
# Each disk entry is of the form phy:UNAME,DEV,MODE
# where UNAME is the device, DEV is the device name the domain will see,
# and MODE is r for read-only, w for read-write.

#disk = [ 'file:/dev/md2,md2,w' ]

#disk = [ 'file:/dev/md3,sda1,w', 'file:/dev/md4,sda2,w' ]
disk = [ 'file:/Domains/t/Fed4.img,sda1,w', 
'file:/Domains/t/Fed4Swap.img,sda2,w' ]

# Set root device.
root = "/dev/sda1 ro"

#nfs_root   = '/full/path/to/root/directory'

# Sets runlevel 4.
extra = "3"

#----------------------------------------------------------------------------
# Set according to whether you want the domain restarted when it exits.
# The default is 'onreboot', which restarts the domain when it shuts down
# with exit code reboot.
# Other values are 'always', and 'never'.

#restart = 'onreboot'

#============================================================================


-----Original Message-----
From: Ian Pratt [mailto:m+Ian.Pratt@xxxxxxxxxxxx]
Sent: Thursday, September 15, 2005 1:04 PM
To: Cole, Ray; bryant.johan@xxxxxxxxx
Cc: xen-users@xxxxxxxxxxxxxxxxxxx; ian.pratt@xxxxxxxxxxxx
Subject: RE: [Xen-users] xen save error



> I can get xen-2.0-testing to fail on live migrations with 
> virtually no load about 10% of the time with live migration 
> :-)  Seems it becomes unable to suspend the user domain 
> kernel - kernel gets the message, but never gets a chance to 
> process it.  I'm not saying 2.0-testing won't resolve the 
> problem John is seeing, but I'm not sure I would quite make 
> the statement that it has been 'battle tested' :-)

Can you say more about your configuration? I haven't heard of migrate
problems on 2.0-testing. Almost all the development effort is focussed
on 3.0, but if it's a reproduceable problem someone might take a look.
Migration on 2.0-testing has been tested pretty thoroughly, so it must
be something to do with your configuration or other xm operations you've
done on the domain since you started it.

Ian 


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.