[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 94442: regressions - FAIL
On 17/05/16 11:57, Jan Beulich wrote: >>>> On 16.05.16 at 11:24, <wei.liu2@xxxxxxxxxx> wrote: >> On Mon, May 16, 2016 at 02:57:13AM +0000, osstest service owner wrote: >>> flight 94442 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/94442/ >> [...] >>> test-amd64-i386-qemuu-rhel6hvm-intel 9 redhat-install fail REGR. vs. >>> 94368 >> The changes in this flight shouldn't cause failure like this. See below. >> >> It is more likely to be caused by SMEP/SMAP fix, which are now in >> master. It seems that previous run didn't discover this. >> >> Log file at: >> >> http://logs.test-lab.xenproject.org/osstest/logs/94442/test-amd64-i386-qemuu-rhel >> >> 6hvm-intel/serial-italia0.log >> >> May 15 22:07:44.023500 (XEN) Xen BUG at entry.S:221 >> May 15 22:07:47.455549 (XEN) ----[ Xen-4.7.0-rc x86_64 debug=y Not >> tainted ]---- >> May 15 22:07:47.463500 (XEN) CPU: 0 >> May 15 22:07:47.463531 (XEN) RIP: e008:[<ffff82d0802411c7>] >> cr4_pv32_restore+0x37/0x40 >> May 15 22:07:47.463567 (XEN) RFLAGS: 0000000000010287 CONTEXT: hypervisor >> (d0v3) >> May 15 22:07:47.471503 (XEN) rax: 0000000000000000 rbx: 00000000cf195e50 >> rcx: 0000000000000001 >> May 15 22:07:47.479496 (XEN) rdx: ffff8300be907ff8 rsi: 0000000000007ff0 >> rdi: 000000000022287e >> May 15 22:07:47.487498 (XEN) rbp: 00007cff416f80c7 rsp: ffff8300be907f08 >> r8: ffff83023df8a000 >> May 15 22:07:47.495498 (XEN) r9: ffff83023df8a000 r10: 00000000deadbeef >> r11: 0000000000800000 >> May 15 22:07:47.503510 (XEN) r12: ffff8300bed32000 r13: ffff83023df8a000 >> r14: 0000000000000000 >> May 15 22:07:47.503549 (XEN) r15: ffff83023df72000 cr0: 0000000080050033 >> cr4: 00000000001526e0 >> May 15 22:07:47.511501 (XEN) cr3: 00000002383d7000 cr2: 00000000b71ff000 >> May 15 22:07:47.519493 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0033 ss: >> 0000 cs: e008 >> May 15 22:07:47.527520 (XEN) Xen code around <ffff82d0802411c7> >> (cr4_pv32_restore+0x37/0x40): >> May 15 22:07:47.535491 (XEN) 3b 05 03 87 0a 00 74 02 <0f> 0b 5a 31 c0 c3 0f >> 1f 00 f6 42 04 01 0f 84 26 >> May 15 22:07:47.535531 (XEN) Xen stack trace from rsp=ffff8300be907f08: >> May 15 22:07:47.543502 (XEN) 0000000000000000 ffff82d080240f22 >> ffff83023df72000 0000000000000000 >> May 15 22:07:47.551559 (XEN) ffff83023df8a000 ffff8300bed32000 >> 00000000cf195e6c 00000000cf195e50 >> May 15 22:07:47.559494 (XEN) 0000000000800000 00000000deadbeef >> ffff83023df8a000 0000000000000206 >> May 15 22:07:47.567496 (XEN) 0000000000000001 0000000000000001 >> 0000000000000000 0000000000007ff0 >> May 15 22:07:47.575503 (XEN) 000000000022287e 0000010000000000 >> 00000000c1001027 0000000000000061 >> May 15 22:07:47.575543 (XEN) 0000000000000246 00000000cf195e44 >> 0000000000000069 000000000000beef >> May 15 22:07:47.583508 (XEN) 000000000000beef 000000000000beef >> 000000000000beef 0000000000000000 >> May 15 22:07:47.591503 (XEN) ffff8300bed30000 0000000000000000 >> 00000000001526e0 >> May 15 22:07:47.599493 (XEN) Xen call trace: >> May 15 22:07:47.599522 (XEN) [<ffff82d0802411c7>] >> cr4_pv32_restore+0x37/0x40 > I think I see the problem the introduction of caching in v3 introduced: > In compat_restore_all_guest we have (getting patched in by altinsn > patching): > > .Lcr4_alt: > testb $3,UREGS_cs(%rsp) > jpe .Lcr4_alt_end > mov CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax > and $~XEN_CR4_PV32_BITS, %rax > mov %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp) > mov %rax, %cr4 > .Lcr4_alt_end: > > If an NMI occurs between the updating og the cached value and the > actual CR4 write, the NMI handling will cause the cached value to get > SMEP+SMAP enabled again (in both cache and CR4), and once we > get back here, we will clear it in just CR4. > > We don't want to undo the caching, as that gave us performance back > at least for 64-bit PV guests. > > We also can't simply swap the two instructions: If we did, an NMI > between the two would itself trigger the BUG in cr4_pv32_restore > (as the check there assumes that CR4 always has no less of the > bits of interest set than the cached value). > > The options I see are: > > 1) Ditch the debug check altogether, for being false positive in > exactly one corner case. > > 2) Make the NMI handler recognize the single critical pair of > instructions. > > 3) Change the code sequence above to > > .Lcr4_alt: > testb $3,UREGS_cs(%rsp) > jpe .Lcr4_alt_end > mov CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax > and $~XEN_CR4_PV32_BITS, %rax > 1: > mov %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp) > mov %rax, %cr4 > /* (suitable comment goes here) */ > cmp %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp) > jne 1b > .Lcr4_alt_end: > > (assuming that an insane flood of NMIs not allowing this loop to > be exited would be sufficiently problematic in other ways). > > I dislike 1, and between 2 and 3 I think I'd prefer the latter, unless > someone else sees something wrong with such an approach. +1 for option 3. If we have a flood of NMIs, we have larger problems than this loop. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |