[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9



On 07.12.20 12:48, Marek Marczykowski-Górecki wrote:
On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:
Marek,

On 06.12.20 17:47, Jason Andryuk wrote:
On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:

On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki wrote:
On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:
On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek Marczykowski-Górecki wrote:
culprit:

commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
Date:   Tue Sep 1 10:33:26 2020 +0200

      xen: add helpers to allocate unpopulated memory

I'm adding relevant people and xen-devel to the thread.
For completeness, here is the original crash message:

That commit definitively adds a new ZONE_DEVICE user, so it does look
related.  But you are not running on Xen, are you?

I am. It is Xen dom0.

I'm afraid I'm on leave and won't be able to look into this until the
beginning of January. I would guess it's some kind of bad
interaction between blkback and NVMe drivers both using ZONE_DEVICE?

Maybe the best is to revert this change and I will look into it when
I get back, unless someone is willing to debug this further.

Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they
both use page->lru which is part of the anonymous union shared with
*pgmap.  That matches Marek's suspicion that the ZONE_DEVICE memory is
being used as ZONE_NORMAL.

memmap_init_zone_device() says:
* ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
* and zone_device_data.  It is a bug if a ZONE_DEVICE page is
* ever freed or placed on a driver-private list.

Second try, now even tested to work on a test system (without NVMe).

It doesn't work for me:

[  526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 
(x86_64-abi) persistent grants
[  526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 
(x86_64-abi) persistent grants
[  526.034810] BUG: kernel NULL pointer dereference, address: 0000000000000010

Oh, indeed. Silly bug. My test was with qdisk as backend :-(

3rd try...


Juergen

Attachment: 0001-xen-add-helpers-for-caching-grant-mapping-pages.patch
Description: Text Data

Attachment: 0002-xen-don-t-use-page-lru-for-ZONE_DEVICE-memory.patch
Description: Text Data

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.