Xen project Mailing List

Hi all,

please find below a brief write-up on using Xen altp2m for stealthy monitoring. While the write-up may be a bit technical and only of interest to a small(er) audience, it also gives some insight into what this new Xen feature is about and what it can be used for. Let me know if you think this would be appropriate for the Xen blog or if you have any other comments!

Thanks,

Tamas

Stealthy monitoring with Xen altp2m

One of the core features that differentiates Xen from other open-source hypervisors is its native support for stealthy and secure monitoring of guest internals (aka. virtual machine introspection). In the latest release of Xen last summer several new features have been introduced that make this subsystem better; a cleaned-up, optimized API and ARM support being just some of the biggest items on this list. As part of this release of Xen, a new and unique feature was also successfully added by a team from Intel that have make stealthy monitoring even better on Xen, named altp2m. In this blog entry we will take a look at what it's all about.

In Xen's terminology, p2m stands for the guest memory management layer that handles the translation from guest [p]hysical memory to [m]achine physical. There are several implementations of this, including hardware support via Intel Extended Page Tables (EPT) available to HVM (and PVH) guests, called Hardware Assisted Paging (hap) in Xen. In this implementation the hypervisor maintains a second pagetable, similar to the one in 64-bit operating systems, dedicated to running the p2m translation. All (open-source) hypervisors that use this hardware assisted paging method use a single EPT per virtual machine to handle this translation, regardless how many vCPUs the guest may have.

Xen altp2m is the first implementation which changes this setup by allowing Xen to create more then one EPT for each guest. Interestingly, the Intel hardware has been capable of maintaining up to 512 EPT pointers in the VMCS since the first introduction of EPT, noone made use of these extra tables until now. In Xen 4.6, Xen allows the creation of up to 10 EPTs per guest. The primary reason for this extension is of course the new #VE and VMFUNC extensions that were released in the Skylake generation of CPUs (which is worth a whole blog-entry on its own), but it can also be used by external monitoring applications as well via the Xen vm_event system.

Why this feature is a game-changer for applications performing purely external monitoring is because it simplifies the monitoring process of multi-vCPU guests. As mentioned earlier, the EPT of the VM prior to altp2m being introduced was always shared across all vCPU of the guest. This introduced serious issues when monitoring of the guests' memory accesses were performed by restricting EPT permissions. While the method allowed for stealthy tracing of R/W/X memory accesses of the guest - EPT violations trap to the hypervisor - the memory permissions had to be relaxed in order to allow the guest to continue execution. This in turn creates a race-condition if the guest has multiple vCPUs, as a vCPU still active could perform the memory access while the permissions are relaxed. On could pause all vCPUs while the one violating the access is sidestepped, which introduces heavy overhead just to avoid a race-condition that may rarely occur. Alternatively, one could emulate the instruction that was violating the EPT permission without relaxing the EPT access permissions since Xen's built-in emulator ignores these. However, this solution (while supported) is not particularly ideal as it still creates significant performance overhead. Moreover, emulation in Xen is known to be buggy, so creating security tools that ultimately rely on an emulator is just asking for trouble.

Xen's altp2m system changes this problem quite significantly. By having multiple EPTs we can have differing access permissions defined in each table, which can be swapped around at will. When the guest makes a memory access that is monitored, instead of having to relax the access permission, Xen can simply switch to an EPT (called a view) that allows the operation to continue. More importantly, this switch can be performed specific to each vCPU, without having to pause any of the other ones, or having to emulate the access, without the guest noticing any of this switching at all. A truly simple and elegant solution.

Of course, EPT based monitoring is not the only introspecting technique used for stealthy monitoring. For example, the Xen based DRAKVUF Dynamic Malware Analysis uses it in combination with an additional technique to maximum effect. EPT based monitoring is known to introduce significant overhead, even with the above mentioned altp2m-based optimization: the granularity of the monitoring is that of a memory page (4KB at least). This creates a lot of "false" events if you are really just interested, for example, when a function-entry point is called. Fortunately, this can be avoided by enabling the trapping of debug instructions into the hypervisor (a feature of the Intel CPU). This method is used in DRAKVUF, which writes breakpoint instructions into the guests' memory at code-locations of interest. To hide the presence of the breakpoints from the guest, these pages get further protected by restricting the pages to be execute-only in the EPT. This allows DRAKVUF to hide the presence of the breakpoints from code-integrity checking mechanisms that may run in the guest, such as Windows Patch Guard. However, this technique has been similarly problematic for multi-vCPU guests, as the breakpoint had to be removed when it was hit (or something was scanning the code), thus potentially leading to missed events.

Fortunately, altp2m has another neat feature that can be used to solve this problem. Beside allowing for changing the memory permissions in the different altp2m views, it also allows to change the mapping itself! The same guest physical memory can be setup to be backed by different pages in the different views, thus truly making the guest physical memory "virtual": where it is mapped really depends on which view the vCPU is running on. This feature allows us to hide the presence of the breakpoints in a brand new way. First, we create a complete shadow copy of the memory page where the breakpoint is going to be written and only write the breakpoint into this shadow copy. Now, using altp2m, we setup a view where the guest physical memory of the page get mapped to our shadow copy. When the breakpoint is hit, or if something is trying to scan the code, we simply switch the view to the un-altered view for the duration of a singlestep, then switch back to the trapped view. This allows us to hide the presence of the breakpoints specific to each vCPU! All without having pause any of the other vCPUs or having to emulate. The first open-source implementation of this tracing has been already merged into the DRAKVUF Malware Analysis System and is available as a reference implementation for those interested in more details.

[Publicity] Stealthy monitoring with Xen altp2m