[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lazily construct slab commit causes BSOD/freeze on xen 4.16


  • To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Paul Durrant <xadimgnik@xxxxxxxxx>
  • Date: Tue, 28 Feb 2023 09:51:43 +0000
  • Delivery-date: Tue, 28 Feb 2023 09:51:53 +0000
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

On 26/02/2023 19:42, Joel Upham wrote:
I was able to confirm that the freeze occurs in OpenXT, however interestingly given 48 hours of waiting, the guest eventually continues to work.  I have some logs from this event.  The Short Log is more concise: it shows directly before the messages of

Feb  1 04:05:02.952634 VM hypervisor: (d12) xenbus|RangeSetPop: fail2
Feb  1 04:05:02.953460 VM hypervisor: (d12) xenbus|RangeSetPop: fail1 (c000009a) Feb  1 04:05:02.953835 VM hypervisor: (d12) GNTTAB: MAP XENMAPSPACE_grant_table[4] @ 00000001.22805000 Feb  1 04:05:02.954312 VM hypervisor: (d12) xenbus|GnttabExpand: added references [00000800 - 000009ff]
Feb  1 04:05:02.957648 VM hypervisor: (d12) xenbus|RangeSetPop: fail2
Feb  1 04:05:02.958151 VM hypervisor: (d12) xenbus|RangeSetPop: fail1 (c000009a) Feb  1 04:05:02.960012 VM hypervisor: (d12) GNTTAB: MAP XENMAPSPACE_grant_table[5] @ 00000001.22806000 Feb  1 04:05:02.960651 VM hypervisor: (d12) xenbus|GnttabExpand: added references [00000a00 - 00000bff]
Feb  1 04:05:02.971874 VM hypervisor: (d12) xenbus|RangeSetPop: fail2
Feb  1 04:05:02.978007 VM hypervisor: (d12) xenbus|RangeSetPop: fail1 (c000009a) Feb  1 04:05:02.979312 VM hypervisor: (d12) GNTTAB: MAP XENMAPSPACE_grant_table[6] @ 00000001.22807000 Feb  1 04:05:02.980254 VM hypervisor: (d12) xenbus|GnttabExpand: added references [00000c00 - 00000dff]

start in perpetuity.

206    if (__RangeSetIsEmpty(RangeSet))
207        goto fail2;

In the above messages that condition is hit and then the grant table is repeatedly expanded. In your BROKEN_MESSAGES attachment though you can also see:

Jan 31 21:31:59.214767 VM hypervisor: (d8) xenbus|GnttabExpand: fail1 (c000009a) Jan 31 21:31:59.214779 VM hypervisor: (d8) xenbus|GnttabEntryCtor: fail1 (c000009a) Jan 31 21:31:59.214790 VM hypervisor: (d8) xenbus|CacheGetObjectFromSlab: fail2 Jan 31 21:31:59.214802 VM hypervisor: (d8) xenbus|CacheGetObjectFromSlab: fail1 (c000009a)

That means you are out of grant table. What size of table have you given to the VM. IIRC the default size these days is 64 pages, but looks like you may need more.

 I haven't gotten a BSOD yet, but if I reproduce it I will send that as well. The BROKEN_MESSAGES attachment shows that the guest froze as the beginning and I waited a long time for it to BSOD, and restarted it eventually.  Any insight as to what might be happening and why we are seeing this freeze.  I got these results with the current xen tools obtained from https://xenproject.org/downloads/windows-pv-drivers/development-builds/windows-pv-master/ <https://xenproject.org/downloads/windows-pv-drivers/development-builds/windows-pv-master/> to ensure it was not any patches that we might add.  Xen version 4.16.4 as the host.


The guest may or may not crash. The drivers are written to tolerate a failure to allocate but e.g. if you persistently got such a failure in a write-out to the page file then Windows would eventually BSOD.

  Paul

-Joel
------------------------------------------------------------------------
*From:* win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> on behalf of Paul Durrant <xadimgnik@xxxxxxxxx>
*Sent:* Thursday, January 19, 2023 12:55 PM
*To:* win-pv-devel@xxxxxxxxxxxxxxxxxxxx <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
*Subject:* Re: Lazily construct slab commit causes BSOD/freeze on xen 4.16
Notice: This message originated outside of ainfosec.com



On 19/01/2023 10:02, Owen Smith wrote:
I've not seen BSODs in this area with XenServer drivers which are based
on the same commit, though we are carrying a patch to this area.
Do you have any details about the crash (crashdumps, bugcheck IDs, etc)
that could help pinpoint the problem?

Attached is the patch XenServer's tools are currently carrying.


Owen, why did/do you need this patch? Did you try running with
CacheAudit() turned on (as it is by default in a debug build)?

    Paul






 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.