Xen project Mailing List

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Mon, 22 Jan 2018 15:15:42 +0000

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 22 Jan 2018 15:15:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/22/2018 01:30 PM, Jan Beulich wrote: >>>> On 22.01.18 at 13:33, <george.dunlap@xxxxxxxxxx> wrote: >> On 01/22/2018 09:25 AM, Jan Beulich wrote: >>>>>> On 19.01.18 at 18:00, <george.dunlap@xxxxxxxxxx> wrote: >>>> On 01/19/2018 04:36 PM, Jan Beulich wrote: >>>>>>>> On 19.01.18 at 16:43, <george.dunlap@xxxxxxxxxx> wrote: >>>>>> So what if instead of trying to close the "windows", we made it so that >>>>>> there was nothing through the windows to see? If no matter what the >>>>>> hypervisor speculatively executed, nothing sensitive was visibile except >>>>>> what a vcpu was already allowed to see, >>>>> >>>>> I think you didn't finish your sentence here, but I also think I >>>>> can guess the missing part. There's a price to pay for such an >>>>> approach though - iterating over domains, or vCPU-s of a >>>>> domain (just as an example) wouldn't be simple list walks >>>>> anymore. There are certainly other things. IOW - yes, and >>>>> approach like this seems possible, but with all the lost >>>>> performance I think we shouldn't go overboard with further >>>>> hiding. >>>> >>>> Right, so the next question: what information *from other guests* are >>>> sensitive? >>>> >>>> Obviously the guest registers are sensitive. But how much of the >>>> information in vcpu struct that we actually need to have "to hand" is >>>> actually sensitive information that we need to hide from other VMs? >>> >>> None, I think. But that's not the main aspect here. struct vcpu >>> instances come and go, which would mean we'd have to >>> permanently update what is or is not being exposed in the page >>> tables used. This, while solvable, is going to be a significant >>> burden in terms of synchronizing page tables (if we continue to >>> use per-CPU ones) and/or TLB shootdown. Whereas if only the >>> running vCPU's structure (and it's struct domain) are exposed, >>> no such synchronization is needed (things would simply be >>> updated during context switch). >> >> I'm not sure we're actually communicating. >> >> Correct me if I'm wrong; at the moment, under XPTI, hypercalls running >> under Xen still have access to all of host memory. To protect against >> SP3, we remove almost all Xen memory from the address space before >> switching to the guest. >> >> What I'm proposing is something like this: >> >> * We have a "global" region of Xen memory that is mapped by all >> processors. This will contain everything we consider not sensitive; >> including Xen text segments, and most domain and vcpu data. But it will >> *not* map all of host memory, nor have access to sensitive data, such as >> vcpu register state. >> >> * We have per-cpu "local" regions. In this region we will map, >> on-demand, guest memory which is needed to perform current operations. >> (We can consider how strictly we need to unmap memory after using it.) >> We will also map the current vcpu's registers. >> >> * On entry to a 64-bit PV guest, we don't change the mapping at all. >> >> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu >> can only access its own RAM and registers. There's no extra overhead to >> context switching into or out of the hypervisor. > > And we would open back up the SP3 variant of guest user mode > attacking its own kernel by going through the Xen mappings. I > can't exclude that variants of SP1 (less likely SP2) allowing indirect > guest-user -> guest-kernel attacks couldn't be found. How? Xen doesn't have the guest kernel memory mapped when it's not using it. >> Given that, I don't understand what the following comments mean: >> >> "There's a price to pay for such an approach though - iterating over >> domains, or vCPU-s of a domain (just as an example) wouldn't be simple >> list walks anymore." >> >> If we remove sensitive information from the domain and vcpu structs, >> then any bit of hypervisor code can iterate over domain and vcpu structs >> at will; only if they actually need to read or write sensitive data will >> they have to perform an expensive map/unmap operation. But in general, >> to read another vcpu's registers you already need to do a vcpu_pause() / >> vcpu_unpause(), which involves at least two IPIs (with one >> spin-and-wait), so it doesn't seem like that should add a lot of extra >> overhead. > > Reading another vCPU-s register can't be compared with e.g. > wanting to deliver an interrupt to other than the currently running > vCPU. I'm not sure what this has to do with what I said. Your original claim was that "iterating over domains wouldn't be simple list walks anymore", and I said it would be. If you want to make some other claim about the cost of delivering an interrupt to another vcpu then please actually make a claim and justify it. >> "struct vcpu instances come and go, which would mean we'd have to >> permanently update what is or is not being exposed in the page tables >> used. This, while solvable, is going to be a significant burden in terms >> of synchronizing page tables (if we continue to use per-CPU ones) and/or >> TLB shootdown." >> >> I don't understand what this is referring to in my proposed plan above. > > I had specifically said these were just examples (ones coming to > mind immediately). And what I'm saying is that I haven't been able to infer any examples here. I can't tell whether there's some misunderstanding of yours I can correct, or if there's some misunderstanding of mine that I can take on (either to solve or dissuade me from pursuing this idea further), because I don't know what you're talking about. > Of course splitting such structures in two parts > is an option, but I'm not sure it's a reasonable one (which perhaps > depends on details on how you would envision the implementation). > If the split off piece(s) was/were being referred to by pointers out > of the main structure, there would be a meaningful risk of some > perhaps rarely executed piece of code de-referencing it in the > wrong context. Otoh entirely independent structures (without > pointers in either direction) would need careful management of > their life times, so one doesn't go away without the other. Well the obvious thing to do would be to change all accesses of "sensitive" data to go through an accessor function. The accessor function could determine if the data was already mapped or if it needed to be mapped before returning it. > You mention the possibility of on demand mapping - if data > structures aren't used frequently, that's certainly an option. > In the end there's a lot of uncertainty here whether the in theory > nice outline could actually live up to the requirements of an > actual implementation. Yet considering the (presumably) > fundamental re-structuring of data which would be required > here calls for at least some of this uncertainty to be addressed > before actually making an attempt to switch over to such a > model. Of course, and that's what I'm proposing we do -- explore the possibility of a "panopticon"* Xen. The question of exactly what bits of hypervisor state we should consider 'sensitive' is needed both for your question (short-term XPTI performance improvements), and mine (long-term restructuring to potentially mitigate all information leaks). -George * ...where Xen assumes that its mapped memory observed by a running vcpu at all times, a la [ https://en.wikipedia.org/wiki/Panopticon ] _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.