[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 5.10.40 dom0 kernel - nvme: Invalid SGL for payload:131072 nents:13


  • To: Andy Smith <andy@xxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 21 Jul 2021 10:10:13 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kocI4D2dJiS81cLtvgQrxWrlWLMgPZKDLgcY+/Zijo0=; b=llQmi2doLTpmz6lt4npkz3FRDJSKjF1DQyNYAmA3EsCWfnzQQYKO6kQWr5dnojB8m7MO6qd5xaYiBca3j+NP/crxS5ef2MBtjLKE2NbU3QUqeM+n53wduE0BPOfKnAZjLUQWSPl0kaNapZjWNTX0C2poqfwe1HfRvNNeGjsRkuqT5oYYWjzjPKFyMTyobCjJWVM92ZknLjmpUKlNsR4d9dRtWjG/jbKU+w3/CWTe/6sKGRkHZe0yqRQp67frsdhEuoksKSVm4teNNTU8g3XGuPj7e2P8EKEqYJ7XKjuaT4dknMNRuvU+5vESIAcM1wR8uuLu+0WYhbu02PyTGYJwWg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TuU4XGq62SNIQWLvpOZkgHvLc4chpzr5dh60Ruf7C2SkI2p3YSnWdgNX633ozEjoNM+r2drFRsMVqPw8lbrGmba53h1NIdpROR30qqOvy45b1ugQeKDoDn8jpJ6H1+nbvkYl3oWqE3tbtBpAX07MQQSfPymrJamVzUzJXr1bDO3iJzy864XIGvx5yqNqHKFuI72syQrNcR0UYBw5rKVssXMbPaMDtEfav7OASZZ7qw5Nj+ZTVteq5F7Xj3hvkehK6bia67WfXsTw1gpGTm1mRLcP8NrHZnHKceG6xe+NokmTFAwKnHCDJqCcyAYBBKcvSbzZQw9nFd+sFHNihqOjmw==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 21 Jul 2021 08:10:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.07.2021 00:32, Andy Smith wrote:
> I have a Debian 10 (buster/stable) dom0 running hypervisor 4.14.2.
> For almost 2 years it's been using the packaged Debian stable kernel
> which is 4.19.x.
> 
> Last night I upgraded the kernel to the buster-backports package
> which is based on 5.10.40 and about 4 hours later got this:
> 
> Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] 
> phys_addr:0x00000015eb803000 offset:0 length:4096 
> dma_address:0x000000209e7b7000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] 
> phys_addr:0x00000015eb7bc000 offset:0 length:4096 
> dma_address:0x000000209e7b8000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] 
> phys_addr:0x00000015eb809000 offset:0 length:4096 
> dma_address:0x000000209e7b9000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] 
> phys_addr:0x00000015eb766000 offset:0 length:4096 
> dma_address:0x000000209e7ba000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] 
> phys_addr:0x00000015eb7a3000 offset:0 length:4096 
> dma_address:0x000000209e7bb000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] 
> phys_addr:0x00000015eb7c6000 offset:0 length:4096 
> dma_address:0x000000209e7bc000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] 
> phys_addr:0x00000015eb7c2000 offset:0 length:4096 
> dma_address:0x000000209e7bd000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] 
> phys_addr:0x00000015eb7a9000 offset:0 length:4096 
> dma_address:0x000000209e7be000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] 
> phys_addr:0x00000015eb775000 offset:0 length:4096 
> dma_address:0x000000209e7bf000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] 
> phys_addr:0x00000015eb7c7000 offset:0 length:4096 
> dma_address:0x00000020ff520000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] 
> phys_addr:0x00000015eb7cb000 offset:0 length:4096 
> dma_address:0x00000020ff521000 dma_length:4096
> Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] 
> phys_addr:0x00000015eb7e3000 offset:0 length:61952 
> dma_address:0x00000020ff522000 dma_length:61952
> Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] 
> phys_addr:0x00000015eb7f2200 offset:512 length:24064 
> dma_address:0x00000020ff531200 dma_length:24064

The nvme driver, by calling blk_queue_virt_boundary(), announces that
it wants sg elements split at 4k boundaries. The last two segments,
while contiguous in both physical and DMA space, have been split at
a boundary that's not in line with this driver restriction. (Element
11 is [perhaps not just by coincidence] 11*11 sectors long. Adding
another request of 11 sectors would bump total size of the element
past 64k, which iirc the block layer uses as an upper cap on sg
element size. Hence I expect the series of requests submitted to have
remained split at that position, rather than the bio-s sg[12] is
comprised of to have got merged into sg[11].)

Since xen-blkback only talks in terms of bio-s, I don't think it is
the party responsible for honoring such driver restrictions. Instead
I'd expect the block layer's bio merging to be where this needs to be
observed. Perhaps it simply doesn't expect to be passed requests in
multiples of 11 sectors together with the capping at 64k (as said -
iirc) and driver restrictions on where splits may occur? And as to
earlier Linux versions working - perhaps the merging logic was less
aggressive back then?

> I've reported this to linux-nvme as well, but I did find this thread
> from 2017 with a similar warning:
> 
>     http://lists.infradead.org/pipermail/linux-nvme/2017-July/012055.html
> 
> In it Christoph Hellwig says, "Given that this is a Xen system I
> wonder if swiotlb-xen is involved here, which does some odd chunking
> of DMA translations?" so I thought I'd ask here as well in case
> anyone is aware of something like this. Unfortunately that thread
> ends without confirmed resolution.

I don't think swiotlb has screwed this up: From the dump of the sg
list we can see that up to element 11 all elements are suitably 4k
aligned. It wouldn't alter the element sizes of an sg list it gets
handed. Yet if the size of a non-final list element isn't a multiple
of the boundary restriction the driver has put in place, then - no
matter whether offset-into-page values got changed - the requirement
is necessarily violated.

(As an aside, seeing the DMA addresses in the dumped sg list, it is
quite clear that this isn't a very efficient way to express the
request to hardware. There would really only 2 elements be needed,
one covering elements 0-8 and the other covering elements 9-12. All
that matters to hardware is contiguity in DMA space.)

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.