[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI




On 31 May 2013, at 12:36, Diana Crisan <dcrisan@xxxxxxxxxxxx> wrote:

> 
> 
> On 31 May 2013, at 11:54, George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:
> 
>> On 31/05/13 09:34, Diana Crisan wrote:
>>> George,
>>> On 30/05/13 17:06, George Dunlap wrote:
>>>> On 05/30/2013 04:55 PM, Diana Crisan wrote:
>>>>> On 30/05/13 16:26, George Dunlap wrote:
>>>>>> On Tue, May 28, 2013 at 4:06 PM, Diana Crisan <dcrisan@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 
>>>>>>> On 26/05/13 09:38, Ian Campbell wrote:
>>>>>>>> On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote:
>>>>>>>>> George,
>>>>>>>>> 
>>>>>>>>> --On 24 May 2013 17:16:07 +0100 George Dunlap
>>>>>>>>> <George.Dunlap@xxxxxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> FWIW it's reproducible on every host h/w platform we've tried
>>>>>>>>>>> (a total of 2).
>>>>>>>>>> Do you see the same effects if you do a local-host migrate?
>>>>>>>>> I hadn't even realised that was possible. That would have made testing
>>>>>>>>> live
>>>>>>>>> migrate easier!
>>>>>>>> That's basically the whole reason it is supported ;-)
>>>>>>>> 
>>>>>>>>> How do you avoid the name clash in xen-store?
>>>>>>>> Most toolstacks receive the incoming migration into a domain named
>>>>>>>> FOO-incoming or some such and then rename to FOO upon completion. Some
>>>>>>>> also rename the outgoing domain "FOO-migratedaway" towards the end so
>>>>>>>> that the bits of the final teardown which can safely happen after the
>>>>>>>> target have start can be done so.
>>>>>>>> 
>>>>>>>> Ian.
>>>>>>> I am unsure what I am doing wrong, but I cannot seem to be able to do a
>>>>>>> localhost migrate.
>>>>>>> 
>>>>>>> I created a domU using "xl create xl.conf" and once it fully booted I
>>>>>>> issued
>>>>>>> an "xl migrate 11 localhost". This fails and gives the output below.
>>>>>>> 
>>>>>>> Would you please advise on how to get this working?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Diana
>>>>>>> 
>>>>>>> 
>>>>>>> root@ubuntu:~# xl migrate 11 localhost
>>>>>>> root@localhost's password:
>>>>>>> migration target: Ready to receive domain.
>>>>>>> Saving to migration stream new xl format (info 0x0/0x0/2344)
>>>>>>> Loading new save file <incoming migration stream> (new xl fmt info
>>>>>>> 0x0/0x0/2344)
>>>>>>> Savefile contains xl domain config
>>>>>>> xc: progress: Reloading memory pages: 53248/1048575    5%
>>>>>>> xc: progress: Reloading memory pages: 105472/1048575   10%
>>>>>>> libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12
>>>>>>> device
>>>>>>> model: spawn failed (rc=-3)
>>>>>>> libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device
>>>>>>> model
>>>>>>> did not start: -3
>>>>>>> libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model
>>>>>>> already exited
>>>>>>> migration target: Domain creation failed (code -3).
>>>>>>> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
>>>>>>> truncated
>>>>>>> reading ready message from migration receiver stream
>>>>>>> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration
>>>>>>> target process [10934] exited with error status 3
>>>>>>> Migration failed, resuming at sender.
>>>>>>> xc: error: Cannot resume uncooperative HVM guests: Internal error
>>>>>>> libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume
>>>>>>> failed for
>>>>>>> domain 11: Success
>>>>>> Aha -- I managed to reproduce this one as well.
>>>>>> 
>>>>>> Your problem is the "vncunused=0" -- that's instructing qemu "You must
>>>>>> use this exact port for the vnc server".  But when you do the migrate,
>>>>>> that port is still in use by the "from" domain; so the qemu for the
>>>>>> "to" domain can't get it, and fails.
>>>>>> 
>>>>>> Obviously this should fail a lot more gracefully, but that's a bit of
>>>>>> a lower-priority bug I think.
>>>>>> 
>>>>>> -George
>>>>> Yes, I managed to get to the bottom of it too and got vms migrating on
>>>>> localhost on our end.
>>>>> 
>>>>> I can confirm I did get the clock stuck problem while doing a localhost
>>>>> migrate.
>>>> 
>>>> Does the script I posted earlier "work" for you (i.e., does it fail after 
>>>> some number of migrations)?
>>> 
>>> I left your script running throughout the night and it seems that it does 
>>> not always catch the problem. I see the following:
>>> 
>>> 1. vm has the clock stuck
>>> 2. script is still running as it seems the vm is still ping-able.
>>> 3. migration fails on the basis that the vm is does not ack the suspend 
>>> request (see below).
>> 
>> So I wrote a script to run "date", sleep for 2 seconds, and run "date" a 
>> second time -- and eventually the *sleep* hung.
>> 
>> The VM is still responsive, and I can log in; if I type "date" manually 
>> successive times then I get an advancing clock, but if I type "sleep 1" it 
>> just hangs.
>> 
>> If you run "dmesg" in the guest, do you see the following line?
>> 
>> CE: Reprogramming failure. Giving up
> 
> I do. It is preceded by:
> CE: xen increased min_delta_ns to 4000000 nsec
> 

It seems that it is always getting stuck when the min_delta_ns is set to 4mil 
nsec. Could this be it? Overflow perhaps?


>> -George
> 
> --
> Diana
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.