[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hypercall fault injection (Was [PATCH 0/3] xen/domain: More structured teardown)


  • To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 21 Dec 2020 19:36:49 +0000
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Mon, 21 Dec 2020 19:37:08 +0000
  • Ironport-sdr: xjvuXya0pJaFcAuDjBsgh0/u3geMOaHQjIzIPcVQ8t1Gb9Ntg9A+3/ws1MTEMiVNuw7BtTXCAZ OKBEm3IWabkL/9fL4XqdqnzTOp3+dZVNZTBklN8Slk8jJt1tXJHEikmxz1YewQLU5lbIzBi+S3 MrA8PLWRUNp2xl4CFj5cASej4zF2GC1GppdjrvdvFbbyJ7idjzy0nveGSyWnGg2Bfqc/dPULZv bPNeoQay3GxuQoep0qiivA+TKp2SYRos3Z+Fr3P9SApJK9QFt1R6HEKqaSZfTx+awC/sUCwaMY ziI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello,

We have some very complicated hypercalls, createdomain, and max_vcpus a
close second, with immense complexity, and very hard-to-test error handling.

It is no surprise that the error handling is riddled with bugs.

Random failures from core functions is one way, but I'm not sure that
will be especially helpful.  In particular, we'd need a way to exclude
"dom0 critical" operations so we've got a usable system to run testing on.

As an alternative, how about adding a fault_ttl field into the hypercall?

The exact paths taken in {domain,vcpu}_create() are sensitive to the
hardware, Xen Kconfig, and other parameters passed into the
hypercall(s).  The testing logic doesn't really want to care about what
failed; simply that the error was handled correctly.

So a test for this might look like:

cfg = { ... };
while ( xc_create_domain(xch, cfg) < 0 )
    cfg.fault_ttl++;


The pro's of this approach is that for a specific build of Xen on a
piece of hardware, it ought to check every failure path in
domain_create(), until the ttl finally gets higher than the number of
fail-able actions required to construct a domain.  Also, the test
doesn't need changing as the complexity of domain_create() changes.

The main con will mostly likely be the invasiveness of code in Xen, but
I suppose any fault injection is going to be invasive to a certain extent.

Fault injection like this would also want pairing with some other plans
I had for dalloc() & friends which wrap the current allocators, and
count (in struct domain) the number and/or size of domain allocations,
so we can a) check for leaks, and b) report how much memory a domain
object (and all its decendent objects) actually takes (seeing as we
don't know this value at all).

Thoughts?

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.