Xen project Mailing List

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI

On 03/06/13 12:05, Stefano Stabellini wrote: > On Mon, 3 Jun 2013, Roger Pau MonnÃ wrote: >> On 31/05/13 17:10, Roger Pau MonnÃ wrote: >>> On 31/05/13 15:07, George Dunlap wrote: >>>> On 31/05/13 13:40, Ian Campbell wrote: >>>>> On Fri, 2013-05-31 at 12:57 +0100, Alex Bligh wrote: >>>>>> --On 31 May 2013 12:49:18 +0100 George Dunlap >>>>>> <george.dunlap@xxxxxxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> No -- Linux is asking, "Can you give me an alarm in 5ns?" And Xen is >>>>>>> saying, "No". So Linux is saying, "OK, how about 5us? 10us? >>>>>>> 20us?" By >>>>>>> the time it reaches 4ms, Linux has had enough, and says, "If this timer >>>>>>> is so bad that it can't give me an event within 4ms it just won't use >>>>>>> timers at all, thank you very much." >>>>>>> >>>>>>> The problem appears to be that Linux thinks it's asking for >>>>>>> something in >>>>>>> the future, but is actually asking for something in the past. It must >>>>>>> look at its watch just before the final domain pause, and then asks for >>>>>>> the time just after the migration resumes on the other side. So it >>>>>>> doesn't realize that 10ms (or something) has already passed, and that >>>>>>> it's actually asking for a timer in the past. The Xen timer driver in >>>>>>> Linux specifically asks Xen for times set in the past to return an >>>>>>> error. >>>>>>> Xen is returning an error because the time is in the past, Linux thinks >>>>>>> it's getting an error because the time is too close in the future and >>>>>>> tries asking a little further away. >>>>>>> >>>>>>> Unfortunately I think this is something which needs to be fixed on the >>>>>>> Linux side; I don't really see how we can work around it in Xen. >>>>>> I don't think fixing it only on the Linux side is a great idea, not >>>>>> least >>>>>> as it makes any current Linux image not live migrateable reliably. >>>>>> That's >>>>>> pretty horrible. >>>>> Ultimately though a guest bug is a guest bug, we don't really want to be >>>>> filling the hypervisor with lots of quirky exceptions to interfaces in >>>>> order to work around them, otherwise where does it end? >>>>> >>>>> A kernel side fix can be pushed to the distros fairly aggressively (it's >>>>> mostly just a case of getting an upstream stable backport then filing >>>>> bugs with the main ones, we've done it before) and for users upgrading >>>>> the kernel via the distros is really not so hard and mostly reuses the >>>>> process they must have in place for guest kernel security updates and >>>>> other important kernel bugs anyway. >>>> >>>> In any case, it seems I was wrong -- Linux does "look at its watch" >>>> every time it asks. >>>> >>>> The generic timer interface is "set me a timer N nanoseconds in the >>>> future"; the Xen timer implementation executes >>>> pvclock_clocksource_read() and adds the delta. So it may well actually >>>> be a bug in Xen. >>>> >>>> Stand by for further investigation... >> >> I've been investigating further during the weekend, and although I'm not >> familiar with the timer code in Xen, I think the problem comes from the >> fact that in __update_vcpu_system_time when Xen detects that the guest >> is using a vtsc it adds offsets to the time passed to the guest, while >> in VCPUOP_set_singleshot_timer Xen compares the time passed from the >> guest using NOW(), which is just the Xen uptime, without taking into >> account any offsets. >> >> This only happens after migration because Xen automatically switches to >> vtsc when it detects that the guest has been migrated. I'm currently >> setting up a Linux PVHVM on shared storage to perform some testing, but >> one possible solution might be to add tsc_mode="native_paravirt" to the >> PVHVM config file, and another one would be fixing >> VCPUOP_set_singleshot_timer to take into account the vtsc offsets and >> correctly translate the time passed from the guest. > > Good analisys! > I think that the right solution would be to fix > VCPUOP_set_singleshot_timer. As a band aid I can confirm that adding tsc_mode="native_paravirt" seems to be working fine (with the tests I've done so far), but it requires the admin to know whether a certain HVM will be using the PV timer or not, which I guess it's not possible in every case. Xen could also force the TSC mode to native_paravirt when it detects that a HVM guest is using the PV timer, but I don't think that's the right approach. Is this something we aim to fix before the 4.3 release? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.