[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ongoing/future speculative mitigation work

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Fri, 7 Dec 2018 18:40:52 +0000
Cc: Martin Pohlack <mpohlack@xxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Joao Martins <joao.m.martins@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Anthony Liguori <aliguori@xxxxxxxxxx>, "Dannowski, Uwe" <uwed@xxxxxxxxx>, Lars Kurth <lars.kurth@xxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, Ross Philipson <ross.philipson@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Juergen Gross <JGross@xxxxxxxx>, Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, "Woodhouse, David" <dwmw@xxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
Delivery-date: Fri, 07 Dec 2018 18:41:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
> Hello,
> 
> This is an accumulation and summary of various tasks which have been
> discussed since the revelation of the speculative security issues in
> January, and also an invitation to discuss alternative ideas.  They are
> x86 specific, but a lot of the principles are architecture-agnostic.
> 
> 1) A secrets-free hypervisor.
> 
> Basically every hypercall can be (ab)used by a guest, and used as an
> arbitrary cache-load gadget.  Logically, this is the first half of a
> Spectre SP1 gadget, and is usually the first stepping stone to
> exploiting one of the speculative sidechannels.
> 
> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
> still experimental, and comes with a ~30% perf hit in the common case),
> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
> into the code isn't a viable solution to the problem.
> 
> An alternative option is to have less data mapped into Xen's virtual
> address space - if a piece of memory isn't mapped, it can't be loaded
> into the cache.
> 
> An easy first step here is to remove Xen's directmap, which will mean
> that guests general RAM isn't mapped by default into Xen's address
> space.  This will come with some performance hit, as the
> map_domain_page() infrastructure will now have to actually
> create/destroy mappings, but removing the directmap will cause an
> improvement for non-speculative security as well (No possibility of
> ret2dir as an exploit technique).
> 
> Beyond the directmap, there are plenty of other interesting secrets in
> the Xen heap and other mappings, such as the stacks of the other pcpus. 
> Fixing this requires moving Xen to having a non-uniform memory layout,
> and this is much harder to change.  I already experimented with this as
> a meltdown mitigation around about a year ago, and posted the resulting
> series on Jan 4th,
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
> some trivial bits of which have already found their way upstream.
> 
> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
> i.e. Xen must never have two pcpus which reference the same pagetable in
> %cr3.
> 
> This property already holds for 32bit PV guests, and all HVM guests, but
> 64bit PV guests are the sticking point.  Because Linux has a flat memory
> layout, when a 64bit PV guest schedules two threads from the same
> process on separate vcpus, those two vcpus have the same virtual %cr3,
> and currently, Xen programs the same real %cr3 into hardware.
> 
> If we want Xen to have a non-uniform layout, are two options are:
> * Fix Linux to have the same non-uniform layout that Xen wants
> (Backwards compatibility for older 64bit PV guests can be achieved with
> xen-shim).
> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
> forever more in the future.
> 
> Option 2 isn't great (especially for perf on fixed hardware), but does
> keep all the necessary changes in Xen.  Option 1 looks to be the better
> option longterm.
> 
> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
> L3 pagetables, because back in the 32bit hypervisor days, we used to
> have linear mappings in the Xen virtual range.  This check is stale
> (from a functionality point of view), but still present in Xen.  A
> consequence of this is that 32bit PV guests definitely don't share
> top-level pagetables across vcpus.

Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
pagetables can be shared. So guests will schedule the same top-level
pagetables across vcpus. 

But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
CR3 provided by guest to the first slot, so pcpus don't share the same
L4 pagetables. The property we want still holds.

> 
> Juergen/Boris: Do you have any idea if/how easy this infrastructure
> would be to implement for 64bit PV guests as well?  If a PV guest can
> advertise via Elfnote that it won't share top-level pagetables, then we
> can audit this trivially in Xen.
> 

After reading Linux kernel code, I think it is not going to be trivial.
As now threads in Linux share one pagetable (as it should be).

In order to make each thread has its own pagetable while still maintain
the illusion of one address space, there needs to be synchronisation
under the hood.

There is code in Linux to synchronise vmalloc, but that's only for the
kernel portion. The infrastructure to synchronise userspace portion is
missing.

One idea is to follow the same model as vmalloc -- maintain a reference
pagetable in struct mm and a list of pagetables for threads, then
synchronise the pagetables in the page fault handler. But this is
probably a bit hard to sell to Linux maintainers because it will touch a
lot of the non-Xen code, increase complexity and decrease performance.

Thoughts?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: George Dunlap

Prev by Date: Re: [Xen-devel] [PATCH v2 15/18] xen: add a mechanism to automatically create XenDevice-s...
Next by Date: Re: [Xen-devel] [PATCH v2 16/18] xen: automatically create XenBlockDevice-s
Previous by thread: [Xen-devel] [linux-4.19 test] 131074: regressions - FAIL
Next by thread: Re: [Xen-devel] Ongoing/future speculative mitigation work
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.