[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code

>>> On 04.05.16 at 12:03, <andrew.cooper3@xxxxxxxxxx> wrote:
> v3 has logic intended to omit cr4 changes while in the content of 32bit
> PV guest userspace.  This in turn was intended to increase performance
> by reducing the quantity of TLB flushes.
> Even if hardware has logic to speed up CR4 writes when following reads,
> the complete omission of the writes in the first place should
> necessarily be faster still.
> v3 performs ~20% worse compared to v2, which indicates that one of the
> logical steps is wrong.

Yet no-one can spot it?

May I suggest to re-measure v2 and v3 on an otherwise identical
baseline (which I'm relatively sure was not the case for the previous
measurements)? Furthermore the delta between v2 and v3 was
larger than the exit-to-guest-user-mode adjustment. Hence another
thing to consider is kind of bisecting between the two versions, at
least at the granularity of the three distinct changes done.

And then, looking at the interdiff, I note that it is quite important to
know whether you measured a debug or a release build. If it was a
debug one, the change to the debugging portion of cr4_pv32_restore
makes it so that the CR4 read is not only not avoided there, but that
read now happens in close succession to a prior write, which afaik
may incur further stalls.

>> And finally, I'm not tied to this patch set. I'd be fine with some
>> other fix to restore correct behavior. What I'm not fine with is the
>> limbo state this is being left in, with me all the time having to revive
>> the discussion.
> I agree that context switching cr4 is the only way of allowing 32bit PV
> guests to function without impacting other functionality.

See below - what makes you assume this would come at a lower

>  After all,
> the PV guest is doing precisely what SMAP is intended to catch.

I don't understand this part, especially when - from the previous
sentence - the context is "32-bit PV guest".

> I also accept that this will come with a performance overhead, but
> frankly, the performance regression on these specific patches is
> massive.

Yet it has to be assumed that any other approach (e.g. the
context switching one you mention) wouldn't be any better.

>  I am very concerned that the first thing I will have to do in
> XenServer is revert them.

That shouldn't be a primary reason to not make forward progress
in the community project. Or else I could equally well say it has to
go in because otherwise we will have to carry the patch locally in
our product trees.

> I don't like this situation, but blindly ploughing ahead given these
> issues is also a problem.

Re-emphasizing again that there is no known or apparent functional
bug with this patch, figuring out the performance aspects (beyond
what I've suggested above) is where I think we would need Intel's
help with. Yet you've seen Feng's reply, which was a result of us
pushing for their input. And once again - as long as there's no
functional bug, I think we either need to make progress in identifying
the reason (which I don't see much I can do for), or - also taking into
account for how long the thing has been broken in the tree - simply
commit what is there and deal with the performance fallout later.
Leaving functionally broken code in the tree for such an extended
(and otherwise unbounded) period of time is a no-go imo.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.