[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GPF on 0xdead000000000100 in nvme_map_data - Linux 5.9.9



On Mon, Dec 07, 2020 at 01:00:14PM +0100, Jürgen Groß wrote:
> On 07.12.20 12:48, Marek Marczykowski-Górecki wrote:
> > On Mon, Dec 07, 2020 at 11:55:01AM +0100, Jürgen Groß wrote:
> > > Marek,
> > > 
> > > On 06.12.20 17:47, Jason Andryuk wrote:
> > > > On Sat, Dec 5, 2020 at 3:29 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
> > > > wrote:
> > > > > 
> > > > > On Fri, Dec 04, 2020 at 01:20:54PM +0100, Marek Marczykowski-Górecki 
> > > > > wrote:
> > > > > > On Fri, Dec 04, 2020 at 01:08:03PM +0100, Christoph Hellwig wrote:
> > > > > > > On Fri, Dec 04, 2020 at 12:08:47PM +0100, Marek 
> > > > > > > Marczykowski-Górecki wrote:
> > > > > > > > culprit:
> > > > > > > > 
> > > > > > > > commit 9e2369c06c8a181478039258a4598c1ddd2cadfa
> > > > > > > > Author: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> > > > > > > > Date:   Tue Sep 1 10:33:26 2020 +0200
> > > > > > > > 
> > > > > > > >       xen: add helpers to allocate unpopulated memory
> > > > > > > > 
> > > > > > > > I'm adding relevant people and xen-devel to the thread.
> > > > > > > > For completeness, here is the original crash message:
> > > > > > > 
> > > > > > > That commit definitively adds a new ZONE_DEVICE user, so it does 
> > > > > > > look
> > > > > > > related.  But you are not running on Xen, are you?
> > > > > > 
> > > > > > I am. It is Xen dom0.
> > > > > 
> > > > > I'm afraid I'm on leave and won't be able to look into this until the
> > > > > beginning of January. I would guess it's some kind of bad
> > > > > interaction between blkback and NVMe drivers both using ZONE_DEVICE?
> > > > > 
> > > > > Maybe the best is to revert this change and I will look into it when
> > > > > I get back, unless someone is willing to debug this further.
> > > > 
> > > > Looking at commit 9e2369c06c8a and xen-blkback put_free_pages() , they
> > > > both use page->lru which is part of the anonymous union shared with
> > > > *pgmap.  That matches Marek's suspicion that the ZONE_DEVICE memory is
> > > > being used as ZONE_NORMAL.
> > > > 
> > > > memmap_init_zone_device() says:
> > > > * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
> > > > * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
> > > > * ever freed or placed on a driver-private list.
> > > 
> > > Second try, now even tested to work on a test system (without NVMe).
> > 
> > It doesn't work for me:
> > 
> > [  526.023340] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 1 
> > (x86_64-abi) persistent grants
> > [  526.030550] xen-blkback: backend/vbd/1/51728: using 2 queues, protocol 1 
> > (x86_64-abi) persistent grants
> > [  526.034810] BUG: kernel NULL pointer dereference, address: 
> > 0000000000000010
> 
> Oh, indeed. Silly bug. My test was with qdisk as backend :-(
> 
> 3rd try...

Now it works :)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.