[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] VPMU interrupt unreliability
On Mon, Jul 24, 2017 at 7:08 AM, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote: > On 07/22/2017 04:16 PM, Kyle Huey wrote: >> Last year I reported[0] seeing occasional instability in performance >> counter values when running rr[1], which depends on completely >> deterministic counts of retired conditional branches of userspace >> programs. >> >> I recently identified the cause of this problem. Xen's VPMU code >> contains a workaround for an alleged Nehalem bug that was added in >> 2010[2]. Supposedly if a hardware performance counter reaches 0 >> exactly during a PMI another PMI is generated potentially causing an >> endless loop. The workaround is to set the counter to 1. In 2013 the >> original bug was believed to affect more than just Nehalem and the >> workaround was enabled for all family 6 CPUs.[3] This workaround >> unfortunately disturbs the counter value in non-deterministic ways >> (since the value the counter has in the irq handler depends on >> interrupt latency), which is fatal to rr. >> >> I've verified that the discrepancies we see in the counted values are >> entirely accounted for by the number of times the workaround is used >> in any given run. Furthermore, patching Xen not to use this >> workaround makes the discrepancies in the counts vanish. I've added >> code[4] to rr that reliably detects this problem from guest userspace. >> >> Even with the workaround removed in Xen I see some additional issues >> (but not disturbed counter values) with the PMI, such as interrupts >> occasionally not being delivered to the guest. I haven't done much >> work to track these down, but my working theory is that interrupts >> that "skid" out of the guest that requested them and into Xen itself >> or perhaps even another guest are not being delivered. >> >> Our current plan is to stop depending on the PMI during rr's recording >> phase (which we use for timeslicing tracees primarily because it's >> convenient) to enable producing correct recordings in Xen guests. >> Accurate replay will not be possible under virtualization because of >> the PMI issues; that will require transferring the recording to >> another machine. But that will be sufficient to enable the use cases >> we care about (e.g. record an automated process on a cloud computing >> provider and have an engineer download and replay a failing recording >> later to debug it). >> >> I can think of several possible ways to fix the overcount problem, including: >> 1. Restricting the workaround to apply only to older CPUs and not all >> family 6 Intel CPUs forever. > > IIRC the question of which processors this workaround is applicable to > was raised and Intel folks (copied here) couldn't find an answer. > > One thing I noticed is that the workaround doesn't appear to be > complete: it is only checking PMC0 status and not other counters (fixed > or architectural). Of course, without knowing what the actual problem > was it's hard to say whether this was intentional. handle_pmc_quirk appears to loop through all the counters ... >> 2. Intercepting MSR loads for counters that have the workaround >> applied and giving the guest the correct counter value. > > > We'd have to keep track of whether the counter has been reset (by the > quirk) since the last MSR write. Yes. >> 3. Or perhaps even changing the workaround to disable the PMI on that >> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works >> on the relevant hardware. > > MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk > runs (in core2_vpmu_do_interrupt()) so we already do this, don't we? I'm suggesting waiting until the *guest* writes to the (virtualized) GLOBAL_OVF_CTRL. > Thanks for looking into this. Would also be interesting to see/confirm > how some interrupts are (possibly) lost. Indeed. Unfortunately it's not a high priority for me at the moment. - Kyle _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |