[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code

>>> On 03.05.16 at 16:10, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 03/05/16 14:58, Jan Beulich wrote:
>>>>> On 17.03.16 at 09:03,  wrote:
>>> Since such guests' kernel code runs in ring 1, their memory accesses,
>>> at the paging layer, are supervisor mode ones, and hence subject to
>>> SMAP/SMEP checks. Such guests cannot be expected to be aware of those
>>> two features though (and so far we also don't expose the respective
>>> feature flags), and hence may suffer page faults they cannot deal with.
>>> While the placement of the re-enabling slightly weakens the intended
>>> protection, it was selected such that 64-bit paths would remain
>>> unaffected where possible. At the expense of a further performance hit
>>> the re-enabling could be put right next to the CLACs.
>>> Note that this introduces a number of extra TLB flushes - CR4.SMEP
>>> transitioning from 0 to 1 always causes a flush, and it transitioning
>>> from 1 to 0 may also do.
>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> So I think we need to take some decision here, and I'm afraid
>> Andrew and I won't be able to settle this between us. He's
>> validly concerned about the performance impact this got proven
>> to have (for 32-bit PV guests), yet I continue to think that correct
>> behavior is more relevant than performance. Hence I think we
>> should bite the bullet and take the change. For those who value
>> performance more than security, they can always disable the use
>> of SMEP and SMAP via command line option.
>> Of course I'm also concerned that Intel, who did introduce the
>> functional regression in the first place, so far didn't participate at
>> all in finding an acceptable solution to the problem at hand...
> What I am even more concerned with is that the supposedly-optimised
> version which tried to omit cr4 writes whenever possible caused a higher
> performance overhead than the version which rewrite cr4 all the time.
> As is stands, v3 of the series is even worse for performance than v2. 
> That alone means that v3 is not in a suitable state to be accepted, as
> there is some hidden bug in it.

For one, we could take v2 (albeit that would feel odd).

And then, from v3 showing worse performance (which only you
have seen the numbers for, and [hopefully] know what deviations
in them really mean) it does not follow that v3 is buggy. That's one
possibility, sure (albeit hard to explain, as functionally everything
is working fine for both you and me, afaik), but not the only one.
Another may be that hardware processes CR4 writes faster when
they happen in close succession to CR4 reads. And I'm sure more
could be come up with.

And finally, I'm not tied to this patch set. I'd be fine with some
other fix to restore correct behavior. What I'm not fine with is the
limbo state this is being left in, with me all the time having to revive
the discussion.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.