Xen project Mailing List

Re: [Xen-devel] VPMU interrupt unreliability

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

Date: Thu, 19 Oct 2017 08:09:34 -0700

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Robert O'Callahan <robert@xxxxxxxxxxxxx>

Delivery-date: Thu, 19 Oct 2017 15:09:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote: > On 10/10/2017 12:54 PM, Kyle Huey wrote: >> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote: >>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky >>> <boris.ostrovsky@xxxxxxxxxx> wrote: >>>>>> One thing I noticed is that the workaround doesn't appear to be >>>>>> complete: it is only checking PMC0 status and not other counters (fixed >>>>>> or architectural). Of course, without knowing what the actual problem >>>>>> was it's hard to say whether this was intentional. >>>>> handle_pmc_quirk appears to loop through all the counters ... >>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS >>>> value one by one and so it is looking at all bits. >>>> >>>>>>> 2. Intercepting MSR loads for counters that have the workaround >>>>>>> applied and giving the guest the correct counter value. >>>>>> We'd have to keep track of whether the counter has been reset (by the >>>>>> quirk) since the last MSR write. >>>>> Yes. >>>>> >>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that >>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works >>>>>>> on the relevant hardware. >>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk >>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we? >>>>> I'm suggesting waiting until the *guest* writes to the (virtualized) >>>>> GLOBAL_OVF_CTRL. >>>> Wouldn't it be better to wait until the counter is reloaded? >>> Maybe! I haven't thought through it a lot. It's still not clear to >>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the >>> interrupt in any way or whether it just resets the bits in >>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is >>> all that's required to reenable it. >>> >>> - Kyle >> I wonder if it would be reasonable to just remove the workaround >> entirely at some point. The set of people using 1) several year old >> hardware, 2) an up to date Xen, and 3) the off-by-default performance >> counters is probably rather small. > > We'd probably want to only enable this for affected processors, not > remove it outright. But the problem is that we still don't know for sure > whether this issue affects NHM only, do we? > > (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html > is the original message) Yes, the basic problem is that we don't know where to draw the line. - Kyle _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.