[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] VPMU interrupt unreliability
On 10/10/2017 12:54 PM, Kyle Huey wrote: > On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote: >> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky >> <boris.ostrovsky@xxxxxxxxxx> wrote: >>>>> One thing I noticed is that the workaround doesn't appear to be >>>>> complete: it is only checking PMC0 status and not other counters (fixed >>>>> or architectural). Of course, without knowing what the actual problem >>>>> was it's hard to say whether this was intentional. >>>> handle_pmc_quirk appears to loop through all the counters ... >>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS >>> value one by one and so it is looking at all bits. >>> >>>>>> 2. Intercepting MSR loads for counters that have the workaround >>>>>> applied and giving the guest the correct counter value. >>>>> We'd have to keep track of whether the counter has been reset (by the >>>>> quirk) since the last MSR write. >>>> Yes. >>>> >>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that >>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works >>>>>> on the relevant hardware. >>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk >>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we? >>>> I'm suggesting waiting until the *guest* writes to the (virtualized) >>>> GLOBAL_OVF_CTRL. >>> Wouldn't it be better to wait until the counter is reloaded? >> Maybe! I haven't thought through it a lot. It's still not clear to >> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the >> interrupt in any way or whether it just resets the bits in >> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is >> all that's required to reenable it. >> >> - Kyle > I wonder if it would be reasonable to just remove the workaround > entirely at some point. The set of people using 1) several year old > hardware, 2) an up to date Xen, and 3) the off-by-default performance > counters is probably rather small. We'd probably want to only enable this for affected processors, not remove it outright. But the problem is that we still don't know for sure whether this issue affects NHM only, do we? (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html is the original message) -boris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |