[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [linux-linus bisection] complete test-amd64-amd64-xl-pvh-intel
On 20/02/2017 00:26, Andrew Cooper wrote: > On 20/02/2017 00:20, Andrew Cooper wrote: >> On 19/02/2017 23:20, osstest service owner wrote: >>> branch xen-unstable >>> xenbranch xen-unstable >>> job test-amd64-amd64-xl-pvh-intel >>> testid guest-start >>> >>> Tree: linux >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git >>> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git >>> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git >>> Tree: qemuu git://xenbits.xen.org/qemu-xen.git >>> Tree: xen git://xenbits.xen.org/xen.git >>> >>> *** Found and reproduced problem changeset *** >>> >>> Bug is in tree: xen git://xenbits.xen.org/xen.git >>> Bug introduced: ab914e04a62727b75782e401eaf2e8b72f717f61 >>> Bug not present: 2f4d2198a9b3ba94c959330b5c94fe95917c364c >>> Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/105915/ >>> >>> >>> commit ab914e04a62727b75782e401eaf2e8b72f717f61 >>> Author: Jan Beulich <jbeulich@xxxxxxxx> >>> Date: Fri Feb 17 15:51:03 2017 +0100 >>> >>> x86: package up context switch hook pointers >>> >>> They're all solely dependent on guest type, so we don't need to repeat >>> all the same three pointers in every vCPU control structure. Instead >>> use >>> static const structures, and store pointers to them in the domain >>> control structure. >>> >>> Since touching it anyway, take the opportunity and expand >>> schedule_tail() in the only two places invoking it, allowing the macro >>> to be dropped. >>> >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> >>> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >> From >> http://logs.test-lab.xenproject.org/osstest/logs/105917/test-amd64-amd64-xl-pvh-intel/serial-fiano0.log >> around Feb 19 23:12:06.269706 >> >> (XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 2 >> (XEN) RIP: e008:[<ffff82d08016795a>] >> domain.c#__context_switch+0x1a3/0x3e3 >> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (d1v0) >> (XEN) rax: 0000000000000000 rbx: 0000000000000002 rcx: 0000000000000000 >> (XEN) rdx: 00000031fd44b600 rsi: 0000000000000003 rdi: ffff83007de27000 >> (XEN) rbp: ffff83027d78fdb0 rsp: ffff83027d78fd60 r8: 0000000000000000 >> (XEN) r9: 0000005716f6126f r10: 0000000000007ff0 r11: 0000000000000246 >> (XEN) r12: ffff83007de27000 r13: ffff83027fb74000 r14: ffff83007dafd000 >> (XEN) r15: ffff83027d7c8000 cr0: 000000008005003b cr4: 00000000001526e0 >> (XEN) cr3: 000000007dd05000 cr2: 0000000000000008 >> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen code around <ffff82d08016795a> >> (domain.c#__context_switch+0x1a3/0x3e3): >> (XEN) 85 68 07 00 00 4c 89 e7 <ff> 50 08 4c 89 ef e8 36 e1 02 00 41 80 >> bd 78 08 >> (XEN) Xen stack trace from rsp=ffff83027d78fd60: >> (XEN) ffff83027d78ffff 0000000000000003 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 ffff83007de27000 ffff83007dafd000 ffff83027fb74000 >> (XEN) 0000000000000002 ffff83027d7c8000 ffff83027d78fe20 ffff82d08016bf1f >> (XEN) ffff82d080131ae2 ffff83027d78fde0 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 ffff83027d78fe20 ffff83007dafd000 >> (XEN) ffff83007de27000 0000005716f5e5da ffff83027d796148 0000000000000001 >> (XEN) ffff83027d78feb0 ffff82d08012def9 ffff83027d7955a0 ffff83027d796160 >> (XEN) 0000000200000004 ffff83027d796140 ffff83027d78fe70 ffff82d08014af39 >> (XEN) ffff83027d78fe70 ffff83007de27000 0000000001c9c380 ffff82d0801bf800 >> (XEN) 000000107dafd000 ffff82d080322b80 ffff82d080322a80 ffffffffffffffff >> (XEN) ffff83027d78ffff ffff83027d780000 ffff83027d78fee0 ffff82d08013128f >> (XEN) ffff83027d78ffff ffff83007dd4c000 ffff83027d7c8000 00000000ffffffff >> (XEN) ffff83027d78fef0 ffff82d0801312e4 ffff83027d78ff10 ffff82d080167582 >> (XEN) ffff82d0801312e4 ffff83007dafd000 ffff83027d78fdc8 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) ffffffff82374000 0000000000000000 0000000000000000 ffffffff81f59180 >> (XEN) 0000000000000000 0000000000000200 ffffffff82390000 0000000000000000 >> (XEN) 0000000000000000 02ffff8000000000 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82d08016795a>] domain.c#__context_switch+0x1a3/0x3e3 >> (XEN) [<ffff82d08016bf1f>] context_switch+0x147/0xf0d >> (XEN) [<ffff82d08012def9>] schedule.c#schedule+0x5ba/0x615 >> (XEN) [<ffff82d08013128f>] softirq.c#__do_softirq+0x7f/0x8a >> (XEN) [<ffff82d0801312e4>] do_softirq+0x13/0x15 >> (XEN) [<ffff82d080167582>] domain.c#idle_loop+0x55/0x62 >> (XEN) >> (XEN) Pagetable walk from 0000000000000008: >> (XEN) L4[0x000] = 000000027d7cd063 ffffffffffffffff >> (XEN) L3[0x000] = 000000027d7cc063 ffffffffffffffff >> (XEN) L2[0x000] = 000000027d7cb063 ffffffffffffffff >> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 2: >> (XEN) FATAL PAGE FAULT >> (XEN) [error_code=0000] >> (XEN) Faulting linear address: 0000000000000008 >> (XEN) **************************************** >> (XEN) >> >> We have followed the ->to() hook on a domain with a NULL ctxt_switch >> pointer (confirmed by the disassembly). n is derived from current, >> which is d1v0, but that would mean we are trying to schedule a vcpu >> before its domain structure has been fully constructed. >> >> The problem is with hvm_domain_initialise() >> >> int hvm_domain_initialise(struct domain *d) >> { >> ... >> if ( is_pvh_domain(d) ) >> { >> register_portio_handler(d, 0, 0x10003, handle_pvh_io); >> return 0; >> } >> ... >> rc = hvm_funcs.domain_initialise(d); >> ... >> } >> >> So PVH domains exit hvm_domain_initialise() earlier than when we call >> the vendor-specific initialisation hooks. >> >> Rather than fixing this specific issue, can I suggest we properly kill >> PVH v1 at this point? Given what else it skips in >> hvm_domain_initialise(), it clearly hasn't functioned properly in the past. > P.S. Ian: Why did this failure not block at the push gate? > > It is a completely repeatable host crash, yet master has been pulled up > to match staging. P.P.S. We have a cascade failure during crash which we should fix. (XEN) Assertion 'current == idle_vcpu[smp_processor_id()]' failed at domain.c:2178 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff82d08016cd3d>] __sync_local_execstate+0x44/0x67 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (d1v0) <snip> (XEN) Xen call trace: (XEN) [<ffff82d08016cd3d>] __sync_local_execstate+0x44/0x67 (XEN) [<ffff82d080196d8d>] invalidate_interrupt+0x40/0x7d (XEN) [<ffff82d080176112>] do_IRQ+0x8c/0x60f (XEN) [<ffff82d0802470f7>] common_interrupt+0x67/0x70 (XEN) [<ffff82d080196865>] machine_halt+0x1d/0x32 (XEN) [<ffff82d0801476c1>] panic+0x10b/0x115 (XEN) [<ffff82d0801a1955>] do_page_fault+0x424/0x4f8 (XEN) [<ffff82d0802471f8>] entry.o#handle_exception_saved+0x66/0xa4 (XEN) [<ffff82d08016795a>] domain.c#__context_switch+0x1a3/0x3e3 (XEN) [<ffff82d08016bf1f>] context_switch+0x147/0xf0d (XEN) [<ffff82d08012def9>] schedule.c#schedule+0x5ba/0x615 (XEN) [<ffff82d08013128f>] softirq.c#__do_softirq+0x7f/0x8a (XEN) [<ffff82d0801312e4>] do_softirq+0x13/0x15 (XEN) [<ffff82d080167582>] domain.c#idle_loop+0x55/0x62 We really shouldn't be enabling interrupts in machine_halt(), because there is no guarantee that it is safe to. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |