[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9
On 07.12.20 12:48, Marek Marczykowski-Górecki wrote: On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:Marek, On 06.12.20 17:47, Jason Andryuk wrote:On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki wrote:On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek Marczykowski-Górecki wrote:culprit: commit 9e2369c06c8a181478039258a4598c1ddd2cadfa Author: Roger Pau Monne <roger.pau@xxxxxxxxxx> Date: Tue Sep 1 10:33:26 2020 +0200 xen: add helpers to allocate unpopulated memory I'm adding relevant people and xen-devel to the thread. For completeness, here is the original crash message:That commit definitively adds a new ZONE_DEVICE user, so it does look related. But you are not running on Xen, are you?I am. It is Xen dom0.I'm afraid I'm on leave and won't be able to look into this until the beginning of January. I would guess it's some kind of bad interaction between blkback and NVMe drivers both using ZONE_DEVICE? Maybe the best is to revert this change and I will look into it when I get back, unless someone is willing to debug this further.Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they both use page->lru which is part of the anonymous union shared with *pgmap. That matches Marek's suspicion that the ZONE_DEVICE memory is being used as ZONE_NORMAL. memmap_init_zone_device() says: * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list.Second try, now even tested to work on a test system (without NVMe).It doesn't work for me: [ 526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 (x86_64-abi) persistent grants [ 526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 (x86_64-abi) persistent grants [ 526.034810] BUG: kernel NULL pointer dereference, address: 0000000000000010 Oh, indeed. Silly bug. My test was with qdisk as backend :-( 3rd try... Juergen Attachment:
0001-xen-add-helpers-for-caching-grant-mapping-pages.patch Attachment:
0002-xen-don-t-use-page-lru-for-ZONE_DEVICE-memory.patch Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |