[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Hypercall fault injection (Was [PATCH 0/3] xen/domain: More structured teardown)
On Tue, Dec 22, 2020 at 12:18 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > > On 22/12/2020 15:47, Tamas K Lengyel wrote: > > > > On Tue, Dec 22, 2020 at 6:14 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > wrote: >> >> On 22/12/2020 10:00, Jan Beulich wrote: >> > On 21.12.2020 20:36, Andrew Cooper wrote: >> >> Hello, >> >> >> >> We have some very complicated hypercalls, createdomain, and max_vcpus a >> >> close second, with immense complexity, and very hard-to-test error >> >> handling. >> >> >> >> It is no surprise that the error handling is riddled with bugs. >> >> >> >> Random failures from core functions is one way, but I'm not sure that >> >> will be especially helpful. In particular, we'd need a way to exclude >> >> "dom0 critical" operations so we've got a usable system to run testing on. >> >> >> >> As an alternative, how about adding a fault_ttl field into the hypercall? >> >> >> >> The exact paths taken in {domain,vcpu}_create() are sensitive to the >> >> hardware, Xen Kconfig, and other parameters passed into the >> >> hypercall(s). The testing logic doesn't really want to care about what >> >> failed; simply that the error was handled correctly. >> >> >> >> So a test for this might look like: >> >> >> >> cfg = { ... }; >> >> while ( xc_create_domain(xch, cfg) < 0 ) >> >> cfg.fault_ttl++; >> >> >> >> >> >> The pro's of this approach is that for a specific build of Xen on a >> >> piece of hardware, it ought to check every failure path in >> >> domain_create(), until the ttl finally gets higher than the number of >> >> fail-able actions required to construct a domain. Also, the test >> >> doesn't need changing as the complexity of domain_create() changes. >> >> >> >> The main con will mostly likely be the invasiveness of code in Xen, but >> >> I suppose any fault injection is going to be invasive to a certain extent. >> > While I like the idea in principle, the innocent looking >> > >> > cfg = { ... }; >> > >> > is quite a bit of a concern here as well: Depending on the precise >> > settings, paths taken in the hypervisor may heavily vary, and hence >> > such a test will only end up being useful if it covers a wide >> > variety of settings. Even if the number of tests to execute turned >> > out to still be manageable today, it may quickly turn out not >> > sufficiently scalable as we add new settings controllable right at >> > domain creation (which I understand is the plan). >> >> Well - there are two aspects here. >> >> First, 99% of all VMs in practice are one of 3 or 4 configurations. An >> individual configuration is O(n) time complexity to test with fault_ttl, >> depending on the size of Xen's logic, and we absolutely want to be able >> to test these deterministically and to completion. >> >> For the plethora of other configurations, I agree that it is infeasible >> to test them all. However, a hypercall like this is easy to wire up >> into a fuzzing harness. >> >> TBH, I was thinking of something like >> https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen >> and XTF "dom0" poking just this hypercall. All the other complicated >> bits of wiring AFL up appear to have been done. >> >> Perhaps when we exhaust that as a source of bugs, we move onto fuzzing >> the L0 Xen, because running on native will give it more paths to >> explore. We'd need some way of reporting path/trace data back to AFL in >> dom0 which might require a bit plumbing. > > > This is a pretty cool idea, I would be very interested in trying this out. If > running Xen nested in a HVM domain is possible (my experiments with nested > setups using Xen have only worked on ancient hw last time I tried) then > running the fuzzer would be entirely possible using VM forks. You don't even > need a special "dom0", you could just add the fuzzer's CPUID harness to Xen's > hypercall handler and the only thing needed from the nested dom0 would be to > trigger the hypercall with a normal config. The fuzzer would take it from > there and replace the config with the fuzzed version directly in VM forks. > Defining what to report as a "crash" to AFL would still need to be defined > manually for Xen as the current sink points are Linux specific > (https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h), > but that should be straight forward. > > Also, running the fuzzer with PVH guests hasn't been tested but since all VM > forking needs is EPT it should work. > > > Xen running inside Xen definitely works, and is even supported as far as > PV-Shim goes (i.e. no nested virt). That would limit testing to just > creation of PV guests at L1, which is plenty to get started with. > > Xen nested under Xen does work to a first approximation, and for the purposes > of fuzzing areas other than the nested-virt logic, might even be ok. (I use > this configuration for a fair chunk of hvm development). > Sounds like there is no blocker on fuzzing any hypercall handler then. Just have to add the harness, define what code-path needs to be defined as a sink, and off you go. Should work with PT-based coverage no problem. If you need to fuzz multiple hypercalls that may require some tweaking as the PT based coverage doesn't support a change in CR3 right now (it's just a limitation of libxdc that does the PT decoding). You could always fall-back to the disassemble/breakpoint/singlestep coverage option but would need to add vmcall/vmenter instruction to the control-flow instruction list here https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/tracer.c#L46. Tamas
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |