[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Ongoing/future speculative mitigation work
On 12/7/18 6:40 PM, Wei Liu wrote: > On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote: >> Hello, >> >> This is an accumulation and summary of various tasks which have been >> discussed since the revelation of the speculative security issues in >> January, and also an invitation to discuss alternative ideas. They are >> x86 specific, but a lot of the principles are architecture-agnostic. >> >> 1) A secrets-free hypervisor. >> >> Basically every hypercall can be (ab)used by a guest, and used as an >> arbitrary cache-load gadget. Logically, this is the first half of a >> Spectre SP1 gadget, and is usually the first stepping stone to >> exploiting one of the speculative sidechannels. >> >> Short of compiling Xen with LLVM's Speculative Load Hardening (which is >> still experimental, and comes with a ~30% perf hit in the common case), >> this is unavoidable. Furthermore, throwing a few array_index_nospec() >> into the code isn't a viable solution to the problem. >> >> An alternative option is to have less data mapped into Xen's virtual >> address space - if a piece of memory isn't mapped, it can't be loaded >> into the cache. >> >> An easy first step here is to remove Xen's directmap, which will mean >> that guests general RAM isn't mapped by default into Xen's address >> space. This will come with some performance hit, as the >> map_domain_page() infrastructure will now have to actually >> create/destroy mappings, but removing the directmap will cause an >> improvement for non-speculative security as well (No possibility of >> ret2dir as an exploit technique). >> >> Beyond the directmap, there are plenty of other interesting secrets in >> the Xen heap and other mappings, such as the stacks of the other pcpus. >> Fixing this requires moving Xen to having a non-uniform memory layout, >> and this is much harder to change. I already experimented with this as >> a meltdown mitigation around about a year ago, and posted the resulting >> series on Jan 4th, >> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html, >> some trivial bits of which have already found their way upstream. >> >> To have a non-uniform memory layout, Xen may not share L4 pagetables. >> i.e. Xen must never have two pcpus which reference the same pagetable in >> %cr3. >> >> This property already holds for 32bit PV guests, and all HVM guests, but >> 64bit PV guests are the sticking point. Because Linux has a flat memory >> layout, when a 64bit PV guest schedules two threads from the same >> process on separate vcpus, those two vcpus have the same virtual %cr3, >> and currently, Xen programs the same real %cr3 into hardware. >> >> If we want Xen to have a non-uniform layout, are two options are: >> * Fix Linux to have the same non-uniform layout that Xen wants >> (Backwards compatibility for older 64bit PV guests can be achieved with >> xen-shim). >> * Make use XPTI algorithm (specifically, the pagetable sync/copy part) >> forever more in the future. >> >> Option 2 isn't great (especially for perf on fixed hardware), but does >> keep all the necessary changes in Xen. Option 1 looks to be the better >> option longterm. >> >> As an interesting point to note. The 32bit PV ABI prohibits sharing of >> L3 pagetables, because back in the 32bit hypervisor days, we used to >> have linear mappings in the Xen virtual range. This check is stale >> (from a functionality point of view), but still present in Xen. A >> consequence of this is that 32bit PV guests definitely don't share >> top-level pagetables across vcpus. > > Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3 > pagetables can be shared. So guests will schedule the same top-level > pagetables across vcpus. > > But, 64bit Xen creates a monitor table for 32bit PAE guest and put the > CR3 provided by guest to the first slot, so pcpus don't share the same > L4 pagetables. The property we want still holds. Ah, right -- but Xen can get away with this because in PAE mode, "L3" is just 4 entries that are loaded on CR3-switch and not automatically kept in sync by the hardware; i.e., the OS already needs to do its own "manual syncing" if it updates any of the L3 entires; so it's the same for Xen. >> Juergen/Boris: Do you have any idea if/how easy this infrastructure >> would be to implement for 64bit PV guests as well? If a PV guest can >> advertise via Elfnote that it won't share top-level pagetables, then we >> can audit this trivially in Xen. >> > > After reading Linux kernel code, I think it is not going to be trivial. > As now threads in Linux share one pagetable (as it should be). > > In order to make each thread has its own pagetable while still maintain > the illusion of one address space, there needs to be synchronisation > under the hood. > > There is code in Linux to synchronise vmalloc, but that's only for the > kernel portion. The infrastructure to synchronise userspace portion is > missing. > > One idea is to follow the same model as vmalloc -- maintain a reference > pagetable in struct mm and a list of pagetables for threads, then > synchronise the pagetables in the page fault handler. But this is > probably a bit hard to sell to Linux maintainers because it will touch a > lot of the non-Xen code, increase complexity and decrease performance. Sorry -- what do you mean "synchronize vmalloc"? If every thread has a different view of the kernel's vmalloc area, then every thread must have a different L4 table, right? And if every thread has a different L4 table, then we've already got the main thing we need from Linux, don't we? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |