[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 23:48, Boris Ostrovsky wrote: > On 9/27/18 5:37 PM, Jens Axboe wrote: >> On 9/27/18 2:33 PM, Sander Eikelenboom wrote: >>> On 27/09/18 21:06, Boris Ostrovsky wrote: >>>> On 9/27/18 2:56 PM, Jens Axboe wrote: >>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>>>>> On 27/09/18 16:26, Jens Axboe wrote: >>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> added support for purging persistent grants when they are not in use. >>>>>>>>> As >>>>>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>>>>> 20-30 minutes. >>>>>>>>> >>>>>>>>> We should keep the grants in the buffer when purging, and only free >>>>>>>>> the >>>>>>>>> grant ref. >>>>>>>>> >>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> >>>>>>>> Reviewed-by: Juergen Gross <jgross@xxxxxxxx> >>>>>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>>>>> >>>>>> Hi Boris/Juergen. >>>>>> >>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>>>>> from Boris pulled on top. >>>>>> Unfortunately it made a VM hang (probably because it's rootFS is >>>>>> shuffled from under it's feet >>>> What do you mean by "rootFS is shuffled from under it's feet " ? >>> Assumption that block-front getting borked and either a kernel crash or >>> rootfs becoming mounted readonly. Didn't (try) to check though. >>> >>>>>> and it gave these in dom0 dmesg: >>>>>> >>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use >>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use >>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use >>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> >>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> >>>>>> >>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>>>>> tried to fix. >>>>>> >>>>>> If you can come up with a debug patch i can give that a spin tomorrow >>>>>> evening or in the weekend, so we are hopefully still in time for the >>>>>> 4.19 release. >>>>> At this late in the game, might make more sense to simply revert the >>>>> buggy commit. Especially since what is currently out there doesn't fix >>>>> the issue for you. >>> Don't know if Boris or Juergen have a hunch about the issue, if not >>> perhaps a revert is the best. >> Anyone? Unless I hear otherwise, I'll revert the series tomorrow. > > Juergen may have something to say by tomorrow, but from my perspective, > given that we are coming up on rc6 --- yes. > > I looked at the patches again and didn't see anything obvious. > > -boris Could also be that what i hit is a latent bug, that is not caused by these patches but merely got uncovered by them. xl dmesg also shows quite some: (XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 grant table from 19 to 20 frames (XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 grant table from 20 to 21 frames (and has done that for ages on my box not leading to any direct problems to my knowledge) I don't know if there could be related and something around the (persistent) grants for block devices could be leaking under some conditions? -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |