[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] "xm save" only works once...

>Am Freitag, 19. August 2005 04:14 schrieb Steven Hand:
>> >Am Montag, 15. August 2005 23:29 schrieb Anthony Liguori:
>> >> Steven Hand wrote:
>> >> >>I am using Xen-2.0.7 on a Dual Intel Xeon 2.8GHz system with 4GB of
>> >> >> ram. I am using 2.6.11 as kernel for my domain 0. Domain 0 uses
>> >> >> Debian Sarge with a backported Xen 2.0.7 package (only litte changes
>> >> >> to the debian 2.0.6 package, nothing important enough to get
>> >> >> metioned). All kernels were compiled against vanilla kernels with
>> >> >> xen-patch. The domain U's are using 2.6.11 or 2.4.30 (debian, suse).
>> >> >>
>> >> >>I have no problems within domains and everything is running very
>> >> >> smoothly, exepct one thing (which was also not working correctly in
>> >> >> xen-2.0.6 for me): I can save a domain with "xm save <domainname>
>> >> >> <suspendfile>" once and I can restore this domain again, but if I try
>> >> >> a second "xm save ..." it simply seems to hang. Nothing happens and
>> >> >> the last thing in the logs are these lines:
>> >> >
>> >> >Is this the same with both 2.4 and 2.6 domUs? I've noticed something
>> >> > similar with 2.0.7 but only with 2.4 domUs ... it would be useful to
>> >> > know if it affects 2.6 also - I'm trying to track it down.
>> >
>> >yes, it's the same with 2.4 and 2.6 domUs...
>> >
>> >> There's a very similiar problem in 3.0 that has to do with a race
>> >> condition with the xc_save/Xend interaction.  xc_save thinks it has sent
>> >> the "suspend" command over the pipe and Xend is waiting for it to
>> >> arrive.
>> >
>> >... but after some more testing I noticed another interessting thing. "xm
>> >save" hangs if the suspend file doesn't exist. For the first time after a
>> >dom0 reboot it's normaly no problem, but if I delete the file and try a
>> > "xm save" again it will not work for 95%.
>> >
>> >If I keep the save-file and then make a "xm save" and a "xm restore" it
>> > seems to be no problem. I made 10 tests and all worked.
>> Fix attached below - it's actually nothing to do with whether the file
>> exists or not. Rather the problem is that on occasion xfrd sends a response
>> and a request in the same 'message', and Xend only deals with the first.
>> The below fixes this for me - please let me know if it works for you,
>I can't test it right now, because the server is in production use now. I have
>to schedule a maintaince window to reboot the system (and that is needed if 
>the problem is not fixed and a "xm save" crashes.

Ok (although I'm confident the fix is a strict stability improvement - I 
stress tested over 15,000 save/restore cycles at a variety of frequencies
without a single problem). 

But then again, it's your server :-) 

Since the problem was a race condition and hence timing (and concurrency 
at the hardware level) are likely to affect the probability of it occurring. 
So e.g. SMP versus not, or slow versus fast machine, or anything like this
could increase the chance you'd see it. 

>I let you know if I could test the patch on the production system (or another 
>smp/ht system), but that can take some more days... sorry.

No probs - the fix is in 2.0-testing but that also includes a bunch of 
other stuff, so probably best to just apply that patch locally. 



Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.