[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash
On 11/07/17 01:37 -0700, Jan Beulich wrote: > >>> On 07.11.17 at 09:23, <xudong.hao@xxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Tuesday, November 7, 2017 4:09 PM > >> >>> On 07.11.17 at 02:37, <xudong.hao@xxxxxxxxx> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> >> Sent: Monday, November 6, 2017 5:17 PM > >> >> >>> On 03.11.17 at 09:29, <xudong.hao@xxxxxxxxx> wrote: > >> >> > We figured out the problem, some corner scripts triggered the error > >> >> > injection at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj > >> >> > -t 0" run over one time, which resulted in Dom0 crash. > >> >> > >> >> But isn't this a valid scenario, which shouldn't result in a kernel > >> >> crash? > >> > What if > >> >> two successive #MCs occurred for the same page? > >> >> I.e. ... > >> >> > >> > > >> > Yes, it's another valid scenario, the expect result is kernel crash. > >> > >> Kernel _crash_ or rather kernel _panic_? Of course without any kernel > >> messages > >> we can't tell one from the other, but to me this makes a difference > >> nevertheless. > >> > > Exactly, Dom0 crash. > > I don't believe a crash is the expected outcome here. > This test case injects two errors to the same dom0 page. During the first injection, offline_page() is called to set PGC_broken flag of that page. During the second injection, offline_page() detects the same broken page is touched again, and then tries to shutdown the page owner, i.e. dom0 in this case: /* * NB. When broken page belong to guest, usually hypervisor will * notify the guest to handle the broken page. However, hypervisor * need to prevent malicious guest access the broken page again. * Under such case, hypervisor shutdown guest, preventing recursive mce. */ if ( (pg->count_info & PGC_broken) && (owner = page_get_owner(pg)) ) { *status = PG_OFFLINE_AGAIN; domain_shutdown(owner, SHUTDOWN_crash); return 0; } So I think Dom0 crash and the following machine reboot are the expected behaviors here. But, it looks a (unexpected) page fault happens during the reboot. Xudong, can you check whether a normal reboot on that machine triggers a page fault? > > And I didn't see any "kernel panic" message from the log -- attach the > > original log again. > > Well, as said - there _no_ kernel log message at all, and hence we > can't tell whether it's a crash or a plain panic. Iirc Xen's "Hardware > Dom0 crashed" can't distinguish the two cases. > The crash is triggered in offline_page() before Xen can inject the error to Dom0, so there is no dom0 kernel log around the crash. This can be confirmed by dumping the call trace when hwdom_shutdown(SHUTDOWN_crash) is called. Xudong, can you do this? Thanks, Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |