[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Proposal - lightweight auto test binaries
On 12/01/2021 17:35, Ian Jackson wrote: > Andy and I were discussing some new tests he's written which (will) > live in xen.git and which we would like to wire into osstest. > > We came up with the following proposal (I have refined some of the > details beyond what we discussed on IRC): > > * Tests will live in tools/tests/ ideally, but they might > live in other places in the xen.git source tree. I've submitted https://lore.kernel.org/xen-devel/20210113123455.23209-1-andrew.cooper3@xxxxxxxxxx/T/#u to clean up the not-really-tests from the tests dir. This leaves us with cpu-policy, depriv, mce-test, vhpet, vpci, x86_emulator and xenstore. Of those, cpu-policy, vpci and x86_emulator are real unit tests, involving varying quantities of gymnastics to compile. They can in principle be run on the build machine, but x86_emulator is sensitive to the CPU instructions available, and would definitely benefit from running on as wider set of hardware as possible. vhpet is also a real unit test, but isn't even compiled by default and seems to need some manual code hacking to compile, and takes a complicated array command line arguments. xenstore needs a running xenstored to talk to, so needs to run on the test system, but is otherwise self contained and easy to use. depriv, I'm struggling to follow. Its presumably doing things with the various ioctl restriction, therefore wants to be on the test system, but does take a complicated array of command line options. mce-test is a complicated mess of test case scripts, which look like they might have been pulled out of a preexisting test framework elsewhere. The binary xen-mceinj uses the MCE interposer functionality to try and inject real-looking errors into the system. I haven't had much luck with this in the past, and it has been bitrotting for most of a decade. I don't know if there is anything salvageable here or not. I've had a quick look through the rest of the tree and I can't spot anything else which is obviously test code. I reckon for a first stab, trying to convert xenstore and cpu-policy to the "simple" API would be a good start. They both have trivial build requirements, and would be trivial to convert to an exit code based protocol, which would get us content to run for both x86 and ARM in OSSTest. CC Juergen, who is the owner of the xenstore test. Are you happy with this being tweaked in the hope that it gets run automatically by OSSTest? The rest of the tests are going to be varying quantities of complicated to convert. > * When appropriately configured, the xen.git build system > will ship them into dist/install/usr/local/.... > > * There will be a specific directory containing tests which > are suitable for running automatically: > dist/install/usr/local/lib/xen/autotests > > * The pass/success reporting API will be encoded into the > filename. For now we define only one API: > dist/install/usr/local/lib/xen/autotests/simple/* Whatever we go for, it ought to be an easy path, and honestly - that's a mouthful to get to. These tests need to be just as easy for developers to use. How about opt/xen-autotest/* ? Much easier in a deb/rpm build to filter the things which clearly shouldn't be in a production system. It's also invariant of libdir which is an advantage for downstreams who test on more than one distro. I certainly have plans to reuse this infrastructure in XenRT. I expect other downstreams would as well. > The "simple" test API is as follows: > > * Every executable file (or symlink to one) in > dist/install/usr/local/lib/xen/autotests/simple/ > will be run with no arguments. > > * Only files whose names contain only [-0-9a-z] will be run. > Others will be ignored. NB _ is not allowed. I'm happy not using _, but why the restriction? > * A test may exit with one of the XTF exit statuses: > 0 # XTF SUCCESS > 3 # XTF SKIP > 4 # XTF ERROR > 5 # XTF FAILURE > 6 # XTF CRASH > (osstest will treat anything other than 0 and 3 as "fail") > Any other termination should be treated as CRASH. > > * If a test exits nonzero it should print some information about > the situation to stderr. > > * Tests may print other information to stderr, which will be captured > and logged. > > * Tests *must not* print anything to stdout. Test runners *must* > ignore stdout (though they may log it, possibly mixed in with > stderr). stdout is reserved for future extension. I disagree. It is far more important that the tests are pleasant for humans to use, both in terms of developing them to begin with, and maintaining them if they go wrong. In practice, this means human-useful stuff on stdout. The test automation does not need to care - all it should do is log stdout/stderr, and act upon the exit code. (Perhaps this is what you meant, but its not clear IMO.) > * Tests should exit with XTF ERROR if they are passed any arguments. ./test-foo --debug-this-harder would be a perfectly reasonable thing for a human to use to investigate problems, and it is what we use in the fuzzing harnesses for investigating crashes. The automation support shouldn't get in the way of humans, and in this case it really doesn't need to. Nothing on the automation side of things ought to try passing parameters, and we absolutely aren't going to put in an addition test to fuzz the command line and check that ERROR is reported in all cases. > * Tests will be run sequentially on a Xen dom0. If the environment > (eg Xen configuration) or hardware is not suitable they should > report SKIP. > > * If a test program reports SUCCESS or SKIP, it *must* leave the > host in a good state suitable for running more tests. It > *must not* leak resources (domains, xenstore entries, files). > (It may of course advance the domid counter.) I'd relax this slightly (s/program reports/is/ ?), because one possibility is a false SUCCESS/SKIP exit because of a bug in the test binary itself. This would in practice be a FAIL/ERROR and the host might be in a bad state as a consequence. > * If a test reports any other status, the host may not been in a good > state any more. > > * Tests should support being run from their installed location > /usr/local/lib/xen/autotests/ > If they need any additional files or data or anything, that > *must* be provided somewhere in dist/install/ for installation > on the host. > > * Tests may also support being run from the build tree, or from > dist/install/ without copying into /usr/local. They may require > an environment variable to be set to find ancillary files, or > perhaps look at their argv[0], or something. I'd perhaps strengthen this to "Wherever possible, tests SHOULD support", because we want to actively encourage this. > * The protocol document (basically, derived from this email) will > live in xen.git (and might be shipped as > dist/install/usr/local/lib/xen/autotests/README.md > or something) > > Future directions: > > We think we may invent a new multiple-test API where the test program > can report the outcome of multiple tests. That will have its own > directory in dist/install/usr/local/lib/xen/autotests/. > > We may invent extensions which involve passing environent variables, > or command line arguments, to "simple" test programs. The former will > be ignored by older test programs and the latter rejected. The > precise details are beyond the scope of this mail. > > Extension may also involve defining a meaning for the test's stdout. So I think the "simple" prototol will be fine for a lot of cases, including one of the two concrete cases I had (the fault_tll test for exercising error paths in the hypervisor). The resource mapping test is more complicated, and this complexity overlaps an area which is already a problem for XTF test selection (and was encountered when OpenXT got XTF tests into a runnable state). For resource mapping, I want to have a test which says "for these configurations, create a domain, and poke the mapping API to check for expected success/failure cases". The problem is "for these configurations". Even at the totally basic level, * PV and Shadow are conditional on Xen's Kconfig. * PV32 is further conditional on command line settings, and/or whether the hardware supports CET-SS. * HVM is dependent on Kconfig, hardware, firmware and command line settings. HAP similarly. `xl create` doesn't handle missing CONFIG_SHADOW, or PV32 being disabled cleanly, despite having suitable information available in `xl info`. While this can (and should) be fixed, its not helpful for the more general testing case. At the moment, some of this information can be obtained from Xen, but the details are scattered through various stable and unstable hypercalls, and in at least one case, in a stringly typed datastructure. None are pleasant to obtain at the shell level. Instead, I was wondering if we should come up with some stable hypfs layout to report Xen platform capabilities. e.g. platform/pv/ (dir) platform/pv/abis = "xen-3.0-x86_64 xen-3.0-x86_32p" (string, but maybe we drop this now as obsolete) platform/pv/32 (bool or dir) platform/hvm/ (dir) platform/hvm/shadow (bool) platform/hvm/hap (bool) Fundamentally, we care about *what* xen can/cannot do in this configuration on this hardware, far more than we care about specifically *why* Xen can't do something[0]. These can be retrieved in a reasonably convenient manner on the shell with $(xenhypfs cat /path) and we could definitely do something more useful for scripting with exit codes, etc. When it comes back to testing, we will have a binary which has a range of configurations built in, and need some mechanism to reliably calculate the subset of configurations applicable in the current situation. However, again in the interest of making the tests useable for humans, I might go as far as to suggest that we have two separate binaries, a `$FOO-select` which prints out a list of suitable list of `$FOO --config-$X` to run and treat as independent "simple" tests. Thoughts? I really am making some of this up as I go along, but it is also an area which I've been trying to come up with a good solution for ages. ~Andrew [0] There is a further goal here in the longrun. A test lab has an inventory of hardware with a known/expected configuration. One part of testing should be "does the orchestration framework's idea of what Xen ought to be capable of, match what Xen thinks it is capable of?". Mismatches here could be a regression in Xen in boot/feature detection, or an unexpected Kconfig for the test, or it could be a change in firmware and/or settings. Failures of this kind have a habit of causing weird indirect failures, rather than clear cut failures. These details are important when it comes to (usefully) testing a feature, e.g. TSC handling across migrate. There are hundreds of moving parts, all of which overlap in complicated ways. The sane to test this is to first confirm that Xen thinks the TSCs are synchronised, then that dom0 agrees, then that we can exercise the various vTSC= options for domains (using an XTF test rather than a full kernel), as well as the paravirt enlightenments, and then to test a full kernel, first on boot, then on localhost live migrate (i.e. the underlying TSC frequency definitely doesn't change), and then with a remote live migration. At the moment, debugging "the kernel decided that time went wonky after migrate" is utterly miserable. The most recent time I played this game, it really was a regression in a firmware update. Frankly - I would love it if all areas of functionality had tests broken down in a manor similar to ^, because life is far too short to reverse engineer "what went wrong first" manually when a robot can answer that question for me.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |