[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] VPMU interrupt unreliability
On 19/10/17 19:24, Kyle Huey wrote: > On Thu, Oct 19, 2017 at 11:20 AM, Meng Xu <xumengpanda@xxxxxxxxx> wrote: >> On Thu, Oct 19, 2017 at 11:40 AM, Andrew Cooper >> <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 19/10/17 16:09, Kyle Huey wrote: >>>> On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky >>>> <boris.ostrovsky@xxxxxxxxxx> wrote: >>>>> On 10/10/2017 12:54 PM, Kyle Huey wrote: >>>>>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote: >>>>>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky >>>>>>> <boris.ostrovsky@xxxxxxxxxx> wrote: >>>>>>>>>> One thing I noticed is that the workaround doesn't appear to be >>>>>>>>>> complete: it is only checking PMC0 status and not other counters >>>>>>>>>> (fixed >>>>>>>>>> or architectural). Of course, without knowing what the actual problem >>>>>>>>>> was it's hard to say whether this was intentional. >>>>>>>>> handle_pmc_quirk appears to loop through all the counters ... >>>>>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS >>>>>>>> value one by one and so it is looking at all bits. >>>>>>>> >>>>>>>>>>> 2. Intercepting MSR loads for counters that have the workaround >>>>>>>>>>> applied and giving the guest the correct counter value. >>>>>>>>>> We'd have to keep track of whether the counter has been reset (by the >>>>>>>>>> quirk) since the last MSR write. >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on >>>>>>>>>>> that >>>>>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that >>>>>>>>>>> works >>>>>>>>>>> on the relevant hardware. >>>>>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk >>>>>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we? >>>>>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized) >>>>>>>>> GLOBAL_OVF_CTRL. >>>>>>>> Wouldn't it be better to wait until the counter is reloaded? >>>>>>> Maybe! I haven't thought through it a lot. It's still not clear to >>>>>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the >>>>>>> interrupt in any way or whether it just resets the bits in >>>>>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is >>>>>>> all that's required to reenable it. >>>>>>> >>>>>>> - Kyle >>>>>> I wonder if it would be reasonable to just remove the workaround >>>>>> entirely at some point. The set of people using 1) several year old >>>>>> hardware, 2) an up to date Xen, and 3) the off-by-default performance >>>>>> counters is probably rather small. >>>>> We'd probably want to only enable this for affected processors, not >>>>> remove it outright. But the problem is that we still don't know for sure >>>>> whether this issue affects NHM only, do we? >>>>> >>>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html >>>>> is the original message) >>>> Yes, the basic problem is that we don't know where to draw the line. >>> vPMU is disabled by default for security reasons, >> >> Is there any document about the possible attack via the vPMU? The >> document I found (such as [1] and XSA-163) just briefly say that the >> vPMU should be disabled due to security concern. >> >> >> [1] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html > Cross-guest information leaks, presumably. Plenty of "not context switching things properly". Off the top of my head, there was also a straight DoS by blindly passing guest values into an unchecked wrmsr(), and privilege escalation via letting the guest choose where ds_store dumped its data. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |