[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI



On 31/05/13 15:07, George Dunlap wrote:
> On 31/05/13 13:40, Ian Campbell wrote:
>> On Fri, 2013-05-31 at 12:57 +0100, Alex Bligh wrote:
>>> --On 31 May 2013 12:49:18 +0100 George Dunlap
>>> <george.dunlap@xxxxxxxxxxxxx>
>>> wrote:
>>>
>>>> No -- Linux is asking, "Can you give me an alarm in 5ns?"  And Xen is
>>>> saying, "No".  So Linux is saying, "OK, how about 5us?  10us?
>>>> 20us?"  By
>>>> the time it reaches 4ms, Linux has had enough, and says, "If this timer
>>>> is so bad that it can't give me an event within 4ms it just won't use
>>>> timers at all, thank you very much."
>>>>
>>>> The problem appears to be that Linux thinks it's asking for
>>>> something in
>>>> the future, but is actually asking for something in the past.  It must
>>>> look at its watch just before the final domain pause, and then asks for
>>>> the time just after the migration resumes on the other side.  So it
>>>> doesn't realize that 10ms (or something) has already passed, and that
>>>> it's actually asking for a timer in the past.  The Xen timer driver in
>>>> Linux specifically asks Xen for times set in the past to return an
>>>> error.
>>>> Xen is returning an error because the time is in the past, Linux thinks
>>>> it's getting an error because the time is too close in the future and
>>>> tries asking a little further away.
>>>>
>>>> Unfortunately I think this is something which needs to be fixed on the
>>>> Linux side; I don't really see how we can work around it in Xen.
>>> I don't think fixing it only on the Linux side is a great idea, not
>>> least
>>> as it makes any current Linux image not live migrateable reliably.
>>> That's
>>> pretty horrible.
>> Ultimately though a guest bug is a guest bug, we don't really want to be
>> filling the hypervisor with lots of quirky exceptions to interfaces in
>> order to work around them, otherwise where does it end?
>>
>> A kernel side fix can be pushed to the distros fairly aggressively (it's
>> mostly just a case of getting an upstream stable backport then filing
>> bugs with the main ones, we've done it before) and for users upgrading
>> the kernel via the distros is really not so hard and mostly reuses the
>> process they must have in place for guest kernel security updates and
>> other important kernel bugs anyway.
> 
> In any case, it seems I was wrong -- Linux does "look at its watch"
> every time it asks.
> 
> The generic timer interface is "set me a timer N nanoseconds in the
> future"; the Xen timer implementation executes
> pvclock_clocksource_read() and adds the delta.  So it may well actually
> be a bug in Xen.
> 
> Stand by for further investigation...

I've also seen this on FreeBSD PVHVM when doing live migration, which
also uses the single shot timer. It seems like the values in
vcpu_info->time are not updated as often as they should after the
migration. I've implemented a back-off mechanism to cope with that, but
this clearly looks like a bug in Xen.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.