[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] xen/domain: Remove function pointers from domain pause helpers



On 12/11/2021 09:36, Julien Grall wrote:
On 11/11/2021 17:57, Andrew Cooper wrote:
Retpolines are expensive, and all these do are select between the sync and nosync helpers.  Pass a boolean instead, and use direct calls everywhere.

To be honest, I much prefer to read the old code. I am totally not against the change but I can see how I would be ready to introduce new function pointers use in the future.

Really?  The only reason there are function points to begin with was because of a far more naive (and far pre-spectre) me fixing a reference counting mess in 2014 by consolidating the logic.  My mistake was not spotting that the function pointers weren't actually necessary in the first place.

So I think we need some guidelines on when to use function pointers in Xen.

It's easy.  If you are in any way unsure, they're probably the wrong answer.  (Ok - I'm being a little facetious)

There are concrete security improvements from not using function pointers, demonstrated by fact that JOP/COP attacks are so pervasive that all major hardware and software vendors are working on techniques (both hardware and software) to prevent forward-edge control flow integrity violations.  (The mandate from the NSA to make this happen certainly helped spur things on, too.)

There are also concrete performance improvements too.  All competitive processors in the market today can cope with direct branches more efficiently than indirect branches, and a key principle behind profile-guided-optimsiation is to rearrange your `ptr()` function pointer call into `if ( ptr == a ) a(); else if ( ptr == b ) b(); else ptr()` based on the frequency of destinations, because this really does make orders of magnitude improvements in some cases.

We have some shockingly inappropriate uses of function pointers in Xen right now (patches 4 and 5 in particular, and "x86/hvm: Remove callback from paging->flush_tlb() hook" posted today).  While this specific example doesn't fall into shockingly inappropriate in my books, it is firmly in the "not appropriate" category.

The more...


Pause/unpause operations on behalf of dom0 are not fastpaths, so avoid
exposing the __domain_pause_by_systemcontroller() internal.

This actually compiles smaller than before:

... the code doesn't really compile smaller on Arm:

42sh>  ../scripts/bloat-o-meter xen-syms-old xen-syms

add/remove: 4/2 grow/shrink: 0/6 up/down: 272/-252 (20)
Function old     new   delta
_domain_pause                                  -     136    +136
_domain_pause_by_systemcontroller              -     120    +120
domain_pause_by_systemcontroller_nosync        -       8      +8
domain_pause_by_systemcontroller               -       8      +8
domain_resume                                136     132      -4
domain_pause_nosync                           12       8      -4
domain_pause                                  12       8      -4
domain_pause_except_self                     188     180      -8
do_domctl                                   5480    5472      -8
domain_kill                                  372     356     -16
do_domain_pause                               88       -     -88
__domain_pause_by_systemcontroller           120       -    -120
Total: Before=966919, After=966939, chg +0.00%


ARM, like x86, compiles for speed, not size.  "it got a bit larger" is generally not as interesting as "it got smaller, despite everything the compiler would normally do in the opposite direction".  Obviously, if the build of Xen got 10x larger then we'd have a problem, but it hasn't.

Conversely, if we were compiling for size not speed, then "it got smaller" is the uninteresting direction.

The truth is that Xen (both x86 and ARM) is a couple of megabytes in RAM and this literally doesn't matter these days where RAM is measured in GB and TB.  We care to an extent for efficient use of the cache/etc, but noone would bat an eyelid at several MB more for a want-to-have feature.


Here, you're saying that for an added 5 instructions, totalling 0.00% delta in the size of the hypervisor, you've got some reasonably frequent codepaths which will execute faster, and cannot be hijacked with a JOP/COP attack.

That's a clear all-around improvement on ARM, just like it is on x86.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.