[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply

On 11/09/2014 19:39, Andres Lagar Cavilla wrote:
On Thu, Sep 11, 2014 at 11:09 AM, Tamas K Lengyel <tamas.lengyel@xxxxxxxxxxxx> wrote:

On Thu, Sep 11, 2014 at 6:42 PM, Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx> wrote:
On Thu, Sep 11, 2014 at 7:40 AM, Tamas K Lengyel <tamas.lengyel@xxxxxxxxxxxx> wrote:
I've removed the CC's as I'm going a bit off-topic here.
In an ideal world, the emulation of the instruction should raise all relevant new mem events. We don't know a priori what the consumer might learn throughout the execution of this specific instruction. Does it read from or write to new gfns which have mem access masks set? TTBOMK, because the emulation does not go through the EPT fault handler, no mem access events will be generated, even if they should.

This is a long-standing shortcoming of mem event in security frameworks, in that mem access is only defined as raising events through EPT faults. One could conceivably craft an attack by having an instruction that through its emulation reads/writes a massive buffer going into other gfns. Conversely, "virtual DMA", i.e. qemu accesses via map_foreign_pages and grant accesses form backends don't raise mem access events. So there are many (conceptual) holes.

Could you provide an example instruction that is trapped-and-emulated by Xen which may be used in such a fashion? Also, is there any technical reason why we couldn't hook such emulations into the mem_event system?

I think it's safe to assume Razvan's dom0 application is powerful enough to emulate the entire trapping instruction and not be victimized.

For the sake of argument, what I'm going at is that after the mem_event has been handled and control is passed to hvm_emulate_one, Xen will start resolving gfn->mfn translations needed by the instruction emulation by internally walking the p2m (read EPT) table with get_page_from_gfn. This will not invoke p2m_mem_access_check (only happens for actual hw faults), so an instruction that reads or writes across pages will not have a mem event generated for the other pages. A rep stos across page boundaries would do that (key: the rep stos is emulated in Xen, and the eip is then moved silently forward, so the hardware actually doesn't get to execute the instruction).

A harder to catch example is a qemu-based driver, which grabs guest pages via the mapcache buckets using xc_map_foreign_bulk. This resolves to MMU_NORMAL_PT_UPDATE, which will grab the target page with ... get_page_from_gfn. Basically, every page qemu reads/writes to/from will not result in a mem event. This is akin to an unrestricted DMA engine that can bypass the hardware PTE protection bits and do things behind the OS back.

Grant mapping also uses get_page_from_gfn ... no mem access checks.

The way to fix it is very laborious, that is why it hasn't happened. The root cause is that p2m->get_entry does not check any of the access bits. It could, and then you would be generating mem events from everywhere. But that brings two problems. First, repeated events, as the same gfn may be read multiple times -- I don't think anybody wants that. Second, you have to be able to sleep on a wait queue when the event ring fills up (unless you are comfortable dropping events). Sleeping on a wait queue pretty much means stopping everything you are doing, carefully unrolling your stack until you hold no spinlocks, going into the wait queue, and when you wake up dive back into business.


Thanks for the in-depth explanation, it certainly sheds some light on the limitations of the mem_access system. I understand that any memory access to mfn's via mechanisms that don't use the trapped EPT (a pv domain or the hypervisor itself) or have a mapping of the same pages via different EPTs won't trigger the mem_event traps. For the emulation part my question was rather if you are aware of any emulation that currently takes place (outside this patch series) which may be used in this fashion?

Uhm. Examples I can think of: MMIO access. The OS reads values from lapic or hpet pages, and those get emulated (although there are lapic fast paths out there). If the buffers in regular RAM fall in pages that have mem access permission trapping set, then no event will be generated (by that mmio instruction).

And all your PV driver frontend needs. Qemu does the RTC (IIRC), so RTC reads also escape mem access.

Xen does all forms of timer and interrupt emulation (so off the top of my head, RTC, PIT, HPET, PMTimer, PIC, IOAPIC and LAPIC) but all other legacy devices are handled by Qemu.  There is now a fastpath for anything emulated by Xen, for performance/scalability reasons with many-vcpu guests.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.