[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI

On Mon, 3 Jun 2013, Roger Pau Monnà wrote:
> On 03/06/13 12:05, Stefano Stabellini wrote:
> > On Mon, 3 Jun 2013, Roger Pau Monnà wrote:
> >> On 31/05/13 17:10, Roger Pau Monnà wrote:
> >>> On 31/05/13 15:07, George Dunlap wrote:
> >>>> On 31/05/13 13:40, Ian Campbell wrote:
> >>>>> On Fri, 2013-05-31 at 12:57 +0100, Alex Bligh wrote:
> >>>>>> --On 31 May 2013 12:49:18 +0100 George Dunlap
> >>>>>> <george.dunlap@xxxxxxxxxxxxx>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> No -- Linux is asking, "Can you give me an alarm in 5ns?"  And Xen is
> >>>>>>> saying, "No".  So Linux is saying, "OK, how about 5us?  10us?
> >>>>>>> 20us?"  By
> >>>>>>> the time it reaches 4ms, Linux has had enough, and says, "If this 
> >>>>>>> timer
> >>>>>>> is so bad that it can't give me an event within 4ms it just won't use
> >>>>>>> timers at all, thank you very much."
> >>>>>>>
> >>>>>>> The problem appears to be that Linux thinks it's asking for
> >>>>>>> something in
> >>>>>>> the future, but is actually asking for something in the past.  It must
> >>>>>>> look at its watch just before the final domain pause, and then asks 
> >>>>>>> for
> >>>>>>> the time just after the migration resumes on the other side.  So it
> >>>>>>> doesn't realize that 10ms (or something) has already passed, and that
> >>>>>>> it's actually asking for a timer in the past.  The Xen timer driver in
> >>>>>>> Linux specifically asks Xen for times set in the past to return an
> >>>>>>> error.
> >>>>>>> Xen is returning an error because the time is in the past, Linux 
> >>>>>>> thinks
> >>>>>>> it's getting an error because the time is too close in the future and
> >>>>>>> tries asking a little further away.
> >>>>>>>
> >>>>>>> Unfortunately I think this is something which needs to be fixed on the
> >>>>>>> Linux side; I don't really see how we can work around it in Xen.
> >>>>>> I don't think fixing it only on the Linux side is a great idea, not
> >>>>>> least
> >>>>>> as it makes any current Linux image not live migrateable reliably.
> >>>>>> That's
> >>>>>> pretty horrible.
> >>>>> Ultimately though a guest bug is a guest bug, we don't really want to be
> >>>>> filling the hypervisor with lots of quirky exceptions to interfaces in
> >>>>> order to work around them, otherwise where does it end?
> >>>>>
> >>>>> A kernel side fix can be pushed to the distros fairly aggressively (it's
> >>>>> mostly just a case of getting an upstream stable backport then filing
> >>>>> bugs with the main ones, we've done it before) and for users upgrading
> >>>>> the kernel via the distros is really not so hard and mostly reuses the
> >>>>> process they must have in place for guest kernel security updates and
> >>>>> other important kernel bugs anyway.
> >>>>
> >>>> In any case, it seems I was wrong -- Linux does "look at its watch"
> >>>> every time it asks.
> >>>>
> >>>> The generic timer interface is "set me a timer N nanoseconds in the
> >>>> future"; the Xen timer implementation executes
> >>>> pvclock_clocksource_read() and adds the delta.  So it may well actually
> >>>> be a bug in Xen.
> >>>>
> >>>> Stand by for further investigation...
> >>
> >> I've been investigating further during the weekend, and although I'm not
> >> familiar with the timer code in Xen, I think the problem comes from the
> >> fact that in __update_vcpu_system_time when Xen detects that the guest
> >> is using a vtsc it adds offsets to the time passed to the guest, while
> >> in VCPUOP_set_singleshot_timer Xen compares the time passed from the
> >> guest using NOW(), which is just the Xen uptime, without taking into
> >> account any offsets.
> >>
> >> This only happens after migration because Xen automatically switches to
> >> vtsc when it detects that the guest has been migrated. I'm currently
> >> setting up a Linux PVHVM on shared storage to perform some testing, but
> >> one possible solution might be to add tsc_mode="native_paravirt" to the
> >> PVHVM config file, and another one would be fixing
> >> VCPUOP_set_singleshot_timer to take into account the vtsc offsets and
> >> correctly translate the time passed from the guest.
> > 
> > Good analisys!
> > I think that the right solution would be to fix
> > VCPUOP_set_singleshot_timer.
> As a band aid I can confirm that adding tsc_mode="native_paravirt" seems
> to be working fine (with the tests I've done so far), but it requires
> the admin to know whether a certain HVM will be using the PV timer or
> not, which I guess it's not possible in every case.

It needs to work out of the box

> Xen could also force the TSC mode to native_paravirt when it detects
> that a HVM guest is using the PV timer, but I don't think that's the
> right approach. Is this something we aim to fix before the 4.3 release?

I think it should be fixed and probably backported anywhere we claim
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.