Xen project Mailing List

[Xen-devel] [xen-unstable test] 24354: regressions - trouble: broken/fail/pass

From: xen.org <ian.jackson@xxxxxxxxxxxxx>

Date: Sat, 11 Jan 2014 12:34:43 +0000

Delivery-date: Sat, 11 Jan 2014 12:35:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

flight 24354 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/24354/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xend-qemut-winxpsp3 3 host-install(3) broken REGR. vs. 24334 test-amd64-amd64-xl-qemut-win7-amd64 7 windows-install fail REGR. vs. 24334 test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 24320 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass test-amd64-i386-xl-qemut-win7-amd64 13 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 13 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass version targeted for testing: xen 4fad2dc72a8607f50c3783e1cbcb3fb25e3af932 baseline version: xen 2d03be65d5c50053fec4a5fa1d691972e5d953c9 ------------------------------------------------------------ People who touched revisions under test: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Daniel Kiper <daniel.kiper@xxxxxxxxxx> David Scott <dave.scott@xxxxxxxxxxxxx> David Vrabel <david.vrabel@xxxxxxxxxx> Don Slutz <dslutz@xxxxxxxxxxx> Ian Campbell <ian.campbell@xxxxxxxxxx> Ian Jackson <ian.jackson@xxxxxxxxxxxxx> Jan Beulich <jbeulich@xxxxxxxx> Julien Grall <julien.grall@xxxxxxxxxx> Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> Rob Hoes <rob.hoes@xxxxxxxxxx> ------------------------------------------------------------ jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvops pass build-armhf-pvops pass build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64 fail test-amd64-i386-xl-credit2 pass test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-i386-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-xl-sedf-pin pass test-amd64-amd64-pv pass test-amd64-i386-pv pass test-amd64-amd64-xl-sedf pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail test-amd64-i386-xl-winxpsp3-vcpus1 fail test-amd64-i386-xend-qemut-winxpsp3 broken test-amd64-amd64-xl-qemut-winxpsp3 fail test-amd64-amd64-xl-qemuu-winxpsp3 fail test-amd64-i386-xend-winxpsp3 fail test-amd64-amd64-xl-winxpsp3 fail ------------------------------------------------------------ sg-report-flight on woking.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. ------------------------------------------------------------ commit 4fad2dc72a8607f50c3783e1cbcb3fb25e3af932 Author: Ian Campbell <ian.campbell@xxxxxxxxxx> Date: Tue Jan 7 15:52:29 2014 +0000 Revert "tools: libxc: flush data cache after loading images into guest memory" This reverts commit a0035ecc0d82c1d4dcd5e429e2fcc3192d89747a. Even with this fix there is a period between the flush and the unmap where processor may speculate data into the cache. The solution is to map this region uncached or to use the HCR.DC bit to mark all guest accesses cached. 89eb02c2204a "xen: arm: force guest memory accesses to cacheable when MMU is disabled" has arranged to do the latter. Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> commit 89eb02c2204a0b42a0aa169f107bc346a3fef802 Author: Ian Campbell <ian.campbell@xxxxxxxxxx> Date: Wed Jan 8 14:09:01 2014 +0000 xen: arm: force guest memory accesses to cacheable when MMU is disabled On ARM guest OSes are started with MMU and Caches disables (as they are on native) however caching is enabled in the domain running the builder and therefore we must ensure cache consistency. The existing solution to this problem (a0035ecc0d82 "tools: libxc: flush data cache after loading images into guest memory") is to flush the caches after loading the various blobs into guest RAM. However this approach has two short comings: - The cache flush primitives available to userspace on arm32 are not sufficient for our needs. - There is a race between the cache flush and the unmap of the guest page where the processor might speculatively dirty the cache line again. (of these the second is the more fundamental) This patch makes use of the the hardware functionality to force all accesses made from guest mode to be cached (the HCR.DC == default cached bit). This means that we don't need to worry about the domain builder's writes being cached because the guests "uncached" accesses will actually be cached. Unfortunately the use of HCR.DC is incompatible with the guest enabling its MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can detect when this happens and disable HCR.DC. This is done with the HCR.TVM (trap virtual memory controls) bit which also causes various other registers to be trapped, all of which can be passed straight through to the underlying register. Once the guest has enabled its MMU we no longer need to trap so there is no ongoing overhead. In my tests Linux makes about half a dozen accesses to these registers before the MMU is enabled, I would expect other OSes to behave similarly (the sequence of writes needed to setup the MMU is pretty obvious). Apart from this unfortunate need to trap these accesses this approach is incompatible with guests which attempt to do DMA operations with their MMU disabled. In practice this means guests with passthrough which we do not yet support. Since a typical guest (including dom0) does not access devices which require DMA until after it is fully up and running with paging enabled the main risk is to in-guest firmware which does DMA i.e. running EFI in a guest, with a disk passed through and booting from that disk. Since we know that dom0 is not using any such firmware and we do not support device passthrough to guests yet we can live with this restriction. Once passthrough is implemented this will need to be revisited. The patch includes a couple of seemingly unrelated but necessary changes: - HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign with the existing set of system register we handled, but broke with the new ones introduced here. - The defines used to decode the HSR system register fields were named the same as the register. This breaks the accessor macros. This had gone unnoticed because the handling of the existing trapped registers did not require accessing the underlying hardware register. Rename those constants with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers). This patch has survived thousands of boot loops on a Midway system. Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> Acked-by: Julien Grall <julien.grall@xxxxxxxxxx> commit ca6bf20d4157b3b0b270e384e47c1e351964be16 Author: Julien Grall <julien.grall@xxxxxxxxxx> Date: Fri Jan 10 03:27:55 2014 +0000 xen/arm: Scrub heap pages during boot Scrub heap pages was disabled because it was slow on the models. Now that Xen supports real hardware, it's possible to enable by default scrubbing. Signed-off-by: Julien Grall <julien.grall@xxxxxxxxxx> Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx> commit 8aba7e1ce9e26cdf9d2b002ed87b4bd75fce4af3 Author: Rob Hoes <rob.hoes@xxxxxxxxxx> Date: Fri Jan 10 13:52:04 2014 +0000 libxl: ocaml: use 'for_app_registration' in osevent callbacks This allows the application to pass a token to libxl in the fd/timeout registration callbacks, which it receives back in modification or deregistration callbacks. It turns out that this is essential for timeout handling, in order to identify which timeout to change on a modify event. Signed-off-by: Rob Hoes <rob.hoes@xxxxxxxxxx> Acked-by: David Scott <dave.scott@xxxxxxxxxxxxx> Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> commit 0896bd8bea84526b00e00d2d076f4f953a3d73cb Author: David Vrabel <david.vrabel@xxxxxxxxxx> Date: Fri Jan 10 17:46:33 2014 +0100 x86: map portion of kexec crash area that is within the direct map area Commit 7113a45451a9f656deeff070e47672043ed83664 (kexec/x86: do not map crash kernel area) causes fatal page faults when loading a crash image. The attempt to zero the first control page allocated from the crash region will fault as the VA return by map_domain_page() has no mapping. The fault will occur on non-debug builds of Xen when the crash area is below 5 TiB (which will be most systems). The assumption that the crash area mapping was not used is incorrect. map_domain_page() is used when loading an image and building the image's page tables to temporarily map the crash area, thus the mapping is required if the crash area is in the direct map area. Reintroduce the mapping, but only the portions of the crash area that are within the direct map area. Reported-by: Don Slutz <dslutz@xxxxxxxxxxx> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> Tested-by: Don Slutz <dslutz@xxxxxxxxxxx> Reviewed-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx> Tested-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx> This is really just a band aid - kexec shouldn't rely on the crash area being always mapped when in the direct mapping range (and it didn't use to in its previous form). That's primarily because map_domain_page() (needed when the area is outside the direct mapping range) may be unusable when wanting to kexec due to a crash, but also because in the case of PFN compression the kexec range (if specified on the command line) could fall into a hole between used memory ranges (while we're currently only ignoring memory at the top of the physical address space, it's pretty clear that sooner or later we will want that selection to become more sophisticated in order to maximize the memory made use of). Acked-by: Jan Beulich <jbeulich@xxxxxxxx> commit 3dbab7a8bf4bef1bb2967cb3a8c7ed2146482ab3 Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Date: Fri Jan 10 17:45:01 2014 +0100 dbg_rw_guest_mem: need to call put_gfn in error path Using a 1G hvm domU (in grub) and gdbsx: (gdb) set arch i8086 warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration of GDB. Attempting to continue with the default i8086 settings. The target architecture is assumed to be i8086 (gdb) target remote localhost:9999 Remote debugging using localhost:9999 Remote debugging from host 127.0.0.1 0x0000d475 in ?? () (gdb) x/1xh 0x6ae9168b Will reproduce this bug. With a debug=y build you will get: Assertion '!preempt_count()' failed at preempt.c:37 For a debug=n build you will get a dom0 VCPU hung (at some point) in: [ffff82c4c0126eec] _write_lock+0x3c/0x50 ffff82c4c01e43a0 __get_gfn_type_access+0x150/0x230 ffff82c4c0158885 dbg_rw_mem+0x115/0x360 ffff82c4c0158fc8 arch_do_domctl+0x4b8/0x22f0 ffff82c4c01709ed get_page+0x2d/0x100 ffff82c4c01031aa do_domctl+0x2ba/0x11e0 ffff82c4c0179662 do_mmuext_op+0x8d2/0x1b20 ffff82c4c0183598 __update_vcpu_system_time+0x288/0x340 ffff82c4c015c719 continue_nonidle_domain+0x9/0x30 ffff82c4c012938b add_entry+0x4b/0xb0 ffff82c4c02223f9 syscall_enter+0xa9/0xae And gdb output: (gdb) x/1xh 0x6ae9168b 0x6ae9168b: 0x3024 (gdb) x/1xh 0x6ae9168b 0x6ae9168b: Ignoring packet error, continuing... Reply contains invalid hex digit 116 The 1st one worked because the p2m.lock is recursive and the PCPU had not yet changed. crash reports (for example): crash> mm_rwlock_t 0xffff83083f913010 struct mm_rwlock_t { lock = { raw = { lock = 2147483647 }, debug = {<No data fields>} }, unlock_level = 0, recurse_count = 1, locker = 1, locker_function = 0xffff82c4c022c640 <__func__.13514> "__get_gfn_type_access" } Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx> Acked-by: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> (qemu changes not included) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.