[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression in recent pvops kernels, dom0 crashes early


  • To: Olaf Hering <olaf@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 13 May 2021 13:11:10 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bn4/+ic4aBLmKvcX13nj1p5Y+R1+ooz3HzIYW3wGZEs=; b=E+K5I9w/b7fZmO7GdAc2EtX5I15/niyr6zp86Fly/iWS8kA07aO1DLm2thP06kRIqwglzAR8tm9JhnRZRfaehYifUR73nacinf5qfDmE2ARmVMM2oLLTAHvCOy9goemrNQ54QbeFO+6kKpvxK2W5ZrYhe7VB/0ZfIUDb5B/oU0XVBl/mqxvYOqDyfBIwZ/JMKV2CZRfjjdVo72mwDG841ilIP482jDWzb0xJeI5mSxJMJX1jwFNTtUG3ffWY1l5C6Y0/nBOhV3x41SrEbVvsQvm/HhKawrEyY+7tIerCnzJOL8NpPxEM4wOKef6xGNGaz332Y+hNQUurUaxKCVPVFA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PKK7D4D+1jS88pghRjHc0ktEsiKeqH6oFFKt3KxGZC9tsIEyARA/zoC+/euPZSDfNIjjlmxx8LhfVGkNu7wMSWLMdIoqt1QHxhwAYD3yJ84ZviwWZZk2lVSu4s88dzw6n8A3ej2AnvnYxLTw7pheEakLXytQfz6idusrLq2nAb6FI0P/V/vjOyNO4gm99YBEej8q9NnSVAPDddWj2IcoLq0AfZlQTM57qbKUO5qCL+tGs7uy9g3hitTKpIUPNNbQGq0ceEEjP9BMAuMzfSU3+gm9iuXjVrmS0nVl7qNVa27a7L5a7gaq7X/cyY8rtxAo6YO4fX4OeeTF87qrGITBVg==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Delivery-date: Thu, 13 May 2021 12:11:37 +0000
  • Ironport-hdrordr: A9a23:Q6nzdajmPMBC78ZCoZoyR55MFnBQX9p13DAbv31ZSRFFG/FwyP rBoB1L73DJYWgqNE3I+erhBEDyewKiyXcT2/hsAV7CZniahILMFuBfBOTZskXd8kHFh4lgPO JbAtJD4b7LfCtHZKTBkXCF+r8bqbHtms3Y5pa9vgJQpENRGsVdBm9Ce3am+yZNNW977PQCZf +hD4Z81kGdkSN9VLXLOpBJZZmOmzWl/6iWLiIuNloC0k2jnDmo4Ln1H1yzxREFSQ5Cxr8k7C zsjxH5zr/LiYD69jbsk0voq7hGktrozdVOQOaWjNIOFznqggG0IKx8Rry5uiwvqu3H0idqrD D1mWZjAy1P0QKVQonsyiGdnzUIkQxepUMK8GXowkcK+qfCNXUH46Mrv/MqTvPbg3BQ9+2Unp g7mV5xjKAnei8oqh6Nr+QgZysa4nZcnkBS59L7r0YvG7f2O4Uh4LD2wituYd499XXBmf4a+a 9VfZjh2Mo=
  • Ironport-sdr: 5XB6+Qj8Wo2ggOX71Lds5FX4VXWo8DxK4hblM+lHxTxBCDkZpvpLbFo3YfnQQlgjvYUTL4O2/e /AegzDZMuNHmkocly/Hkeizz2eMF6JslNpAw4l6S2T/EOD1wAjv9IPoDOJbyVxUDqjkUvwNHtO JbsV4MrM5VQoo03Rtiu/ey5MIuMsn3U3u3cAEZ20VKoh1fu4CQqPjuWzqC6ZTDUL0Hkn8VQDZq ucpkJMFf8g5dUT1BS0udYh4Jzy5IMffueGxUx6l7NEVa9GrNVKuXjaehKPqlyuyBVuyGJaJ2lU xNY=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13/05/2021 11:24, Olaf Hering wrote:
> Recent pvops dom0 kernels fail to boot on this particular ProLiant BL465c G5 
> box.
> It happens to work with every Xen and a 4.4 based sle12sp3 kernel, but fails 
> with every Xen and a 4.12 based sle12sp4 (and every newer) kernel.
>
> Any idea what is going on?
>
> ....
> (XEN) Freed 256kB init memory.
> (XEN) mm.c:1758:d0 Bad L1 flags 800000
> (XEN) traps.c:458:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 
> [ec=0000]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08022a2a0 
> create_bounce_frame+0x133/0x143
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.4.20170405T152638.6bf0560e12-9.xen44  x86_64  debug=y  Not 
> tainted ]----
> ....
>
> ....
> (XEN) Freed 656kB init memory
> (XEN) mm.c:2165:d0v0 Bad L1 flags 800000
> (XEN) d0v0 Unhandled invalid opcode fault/trap [#6, ec=ffffffff]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d04031a016 
> x86_64/entry.S#create_bounce_frame+0x15d/0x177
> (XEN) Domain 0 (vcpu#0) crashed on cpu#5:
> (XEN) ----[ Xen-4.15.20210504T145803.280d472f4f-6.xen415  x86_64  debug=y  
> Not tainted ]----
> ....
>
> I can probably cycle through all kernels between 4.4 and 4.12 to see where it 
> broke.

"Unhandled invalid opcode fault/trap" is "Xen tried to raise #UD with
the guest, and it hasn't set up a handler yet".  The Bad L1 flags
earlier means there was an attempted edit to a pagetable which was
rejected by Xen.

These two things aren't obviously related by a single action in Xen, so
I expect the pagetable modification failed, and the guest fell into a
bad error path.


If I'm counting bits correctly, that is Xen rejecting the use of the NX
bit, which is suspicious.  Do you have the full Xen boot log on this
box?  I wonder if we've some problem clobbing the XD-disable bit.

~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.