[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: Friday, May 31, 2013 12:32 PM
> To: Aravindh Puthiyaparambil (aravindp)
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on
> VMware ESX 5.1
> 
> On 31/05/13 20:19, Aravindh Puthiyaparambil (aravindp) wrote:
> > I am trying to boot xen-unstable
> (9204bc654562976c7cdebf21c6b5013f6e3057b3) on VMware ESX 5.1 and
> Workstation 9. I have enabled "Virtualize Intel VT-x/EPT" option. I am seeing
> the following GPF during boot:
> >
> > (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank
> > 0 extended MCE MSR 0
> > (XEN) Intel machine check reporting enabled
> > (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> > (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff82c4c026ca80   rcx: 0000000000000000
> > (XEN) rdx: ffff83001d6b2fe0   rsi: bad0bad0bad0bad0   rdi:
> bad0bad0bad0bad0
> > (XEN) rbp: ffff82c4c02cfe08   rsp: ffff82c4c02cfde8   r8:  ffff8300000b8f00
> > (XEN) r9:  0000000000000010   r10: bad0bad0bad0bad0   r11:
> 0000000000000010
> > (XEN) r12: ffff83001ffd9fe0   r13: 0000000000000000   r14: ffff82c4c02c8000
> > (XEN) r15: ffff83000008efb0   cr0: 000000008005003b   cr4: 00000000000400f0
> > (XEN) cr3: 000000001fc7b000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4c02cfde8:
> > (XEN)    0000000000000000 ffff82c4c026ca80 0000000080000008
> 00000000ffffffff
> > (XEN)    ffff82c4c02cfe48 ffff82c4c01a7356 1fabfbff000206a7
> 0000000096ba2223
> > (XEN)    ffff83001ffd9820 0000000000000002 ffff83001ffd9820
> ffff82c4c02c8000
> > (XEN)    ffff82c4c02cff08 ffff82c4c02a4536 0000000200000000
> 0000000000000000
> > (XEN)    ffff83000008ed90 00000000011fb000 0000000000100000
> ffff83000008efb0
> > (XEN)    0000000000000000 ffff83000051bc90 ffff830000000010
> ffff8300ffffff00
> > (XEN)    ffff83000008ef40 ffff82c400000001 0000000800000000
> 000000010000006e
> > (XEN)    0000000000000003 00000000000002f8 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 ffff82c4c01000b5 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    ffff83001d6b0000 0000000000000000 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> > (XEN)    [<ffff82c4c01a7356>] identify_cpu+0x2b4/0x2d0
> > (XEN)    [<ffff82c4c02a4536>] __start_xen+0x26e9/0x2c98
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> > (XEN)
> >
> > I have narrowed it down to line 631 in set_poll_bankmask():
> >     bitmap_copy(mb->bank_map, mca_allbanks->bank_map,
> nr_mce_banks);
> >
> > What is happening is that in mca_cap_init(), nr_mce_banks is being set to
> 0. This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR
> which is the return value for zero-size allocation by
> xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail
> disastrously. Is it correct to disable MCE if nr_mce_banks is 0? Or say this 
> is a
> quirk of the VMware virtual platform and run with mce=0? Linux is to be able
> to handle this gracefully.
> >
> > Another question I have is that callers of xzalloc_array() and friends only
> check for a NULL return as an error. So what about cases like the one above
> which fell through the cracks because the return value is ZERO_BLOCK_PTR?
> Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls
> are made with zero size allocations?
> >
> > Thanks,
> > Aravindh
> 
> ZERO_BLOCK_PTR is specifically distinguished from NULL (As the comment
> beside it says).
> 
> The real bug is calling **alloc() with 0 as a parameter.
> 
> I would say that nr_mce_banks of 0 should result in an implicit mce=0.
> You certainly cant sensibly use MCEs with 0 banks to play with.

OK. I will submit a patch.

Aravindh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.