[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-users] xen save error
BTW, back when I was using printk's in the user kernel to determine if it was getting the message to suspend or not, I found it really odd that I could remove the line in the kernel that responds to the receipt of the suspend command (keeping the one that says "I have suspended") and it actually works better - usually going 12 to 20 migrations between failures. I was somewhat surprised that removing the response that the message was receive would work. I would have figured something would have been waiting on the receipt of that message. This makes me wonder if there is some timing issue going on where the kernel is told to suspend and then the domain stops getting CPU time before it is able to complete suspending. -- Ray -----Original Message----- From: Cole, Ray Sent: Thursday, September 15, 2005 1:29 PM To: 'Ian Pratt'; bryant.johan@xxxxxxxxx Cc: xen-users@xxxxxxxxxxxxxxxxxxx; ian.pratt@xxxxxxxxxxxx Subject: RE: [Xen-users] xen save error Sure. I've got 2 machines where I've installed 2.0-testing from xen-2.0-testing-install.tar.gz. I downloaded it this morning to make sure I had the latest. Ran install.sh. I made sure grub is pointing to xen-2.0.gz, which is in turn a symbolic link to xen-2.0-testing.gz. I did a depmod for 2.6.12-xen0 and xenU and created initrd's for both. Also modified grub to use the 2.6.12-xen0 kernel with it and rebooted. uname -a confirmed I'm using 2.6.12-xen0. Domain 0 was orignally a RedHat AS 4.0 installation ('minimal installation' selected). I then copied a Fedora Core 4 installation image to an NFS mount location. Also created a swap file (have tried with and without) on the NFS link. I created the .cfg file for the domain - nothing special about it. /Domains/t is where the NFS mount is made. The .cfg is later in this email. The FC4 image has had /lib/tls renamed to tls.disabled, although I still get a warning when booting the user domain that I've got /lib/tls. I don't know, maybe the initrd has it. Anyway...from I started xend/xfrd. I start up the FC4 domain using 2.6.12-xenU. This domain uses autofs extensively (all /home entries are automounted) and NIS. I log in to the user domain using a remote xterm (ssh into the domain, start xterm). I then start 'top' so I can see that the domain is still alive. I then do: xm migrate --live rayfed4 {new_machine} back and forth between the two machines that have identical Xen 2.0 Testing installations. I can generally go back and forth about 4 or 5 times before one of the migrate commands tells me it had an error (can't suspend). I had at one time put printk's into the user kernel (after downloading the 2.0 testing source, of course..) and confirmed that the kernel receives the message to suspend, but the suspend work the kernel schedules never gets executed. I wait about 10 seconds between migration attempts. Below is my .cfg: # -*- mode: python; -*- #============================================================================ # Python configuration setup for 'xm create'. # This script sets the parameters used when a domain is created using 'xm create'. # You use a separate script for each domain you want to create, or # you can set the parameters for the domain on the xm command line. #============================================================================ #---------------------------------------------------------------------------- # Kernel image file. kernel = "/boot/vmlinuz-2.6.12-xenU" # Optional ramdisk. ramdisk = "/boot/initrd-2.6.12-xenU.img" # The domain build function. Default is 'linux'. #builder='linux' # Initial memory allocation (in megabytes) for the new domain. memory = 192 # A name for your domain. All domains must have different names. name = "rayfed4" # Which CPU to start domain on? #cpu = -1 # leave to Xen to pick #---------------------------------------------------------------------------- # Define network interfaces. # Number of network interfaces. Default is 1. #nics=1 # Optionally define mac and/or bridge for the network interfaces. # Random MACs are assigned if not given. #vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0' ] vif = [ 'mac=52:54:00:12:34:56' ] #---------------------------------------------------------------------------- # Define the disk devices you want the domain to have access to, and # what you want them accessible as. # Each disk entry is of the form phy:UNAME,DEV,MODE # where UNAME is the device, DEV is the device name the domain will see, # and MODE is r for read-only, w for read-write. #disk = [ 'file:/dev/md2,md2,w' ] #disk = [ 'file:/dev/md3,sda1,w', 'file:/dev/md4,sda2,w' ] disk = [ 'file:/Domains/t/Fed4.img,sda1,w', 'file:/Domains/t/Fed4Swap.img,sda2,w' ] # Set root device. root = "/dev/sda1 ro" #nfs_root = '/full/path/to/root/directory' # Sets runlevel 4. extra = "3" #---------------------------------------------------------------------------- # Set according to whether you want the domain restarted when it exits. # The default is 'onreboot', which restarts the domain when it shuts down # with exit code reboot. # Other values are 'always', and 'never'. #restart = 'onreboot' #============================================================================ -----Original Message----- From: Ian Pratt [mailto:m+Ian.Pratt@xxxxxxxxxxxx] Sent: Thursday, September 15, 2005 1:04 PM To: Cole, Ray; bryant.johan@xxxxxxxxx Cc: xen-users@xxxxxxxxxxxxxxxxxxx; ian.pratt@xxxxxxxxxxxx Subject: RE: [Xen-users] xen save error > I can get xen-2.0-testing to fail on live migrations with > virtually no load about 10% of the time with live migration > :-) Seems it becomes unable to suspend the user domain > kernel - kernel gets the message, but never gets a chance to > process it. I'm not saying 2.0-testing won't resolve the > problem John is seeing, but I'm not sure I would quite make > the statement that it has been 'battle tested' :-) Can you say more about your configuration? I haven't heard of migrate problems on 2.0-testing. Almost all the development effort is focussed on 3.0, but if it's a reproduceable problem someone might take a look. Migration on 2.0-testing has been tested pretty thoroughly, so it must be something to do with your configuration or other xm operations you've done on the domain since you started it. Ian _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |