Xen project Mailing List

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, <jgross@xxxxxxxx>

From: SeongJae Park <sjpark@xxxxxxxxxx>

Date: Tue, 17 Dec 2019 14:15:26 +0100

Cc: axboe@xxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, konrad.wilk@xxxxxxxxxx, pdurrant@xxxxxxxxxx, sjpark@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, SeongJae Park <sj38.park@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 17 Dec 2019 13:16:29 +0000

Ironport-sdr: vi4NhioA0murkDrs42ZQ2cyA/36yAAx4m9Msoq3hzWH6XkIljOquK3vpXTvIUE13e7V25F2a8z gLY5hVXC/xKQ==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 17 Dec 2019 12:39:15 +0100 "Roger Pau Monné" <roger.pau@xxxxxxxxxx> wrote: > On Mon, Dec 16, 2019 at 08:48:03PM +0100, SeongJae Park wrote: > > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > > > > On 16.12.19 17:15, SeongJae Park wrote: > > > > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@xxxxxxxxxx> > > > > wrote: > > > > > > > >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@xxxxxxxxxx> > > > >> wrote: > > > >> > > > >>> From: SeongJae Park <sjpark@xxxxxxxxx> > > > >>> > > > > [...] > > > >>> --- a/drivers/block/xen-blkback/xenbus.c > > > >>> +++ b/drivers/block/xen-blkback/xenbus.c > > > >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct > > > >>> xenbus_device *dev, > > > >>> } > > > >>> > > > >>> > > > >>> +/* Once a memory pressure is detected, squeeze free page pools for a > > > >>> while. */ > > > >>> +static unsigned int buffer_squeeze_duration_ms = 10; > > > >>> +module_param_named(buffer_squeeze_duration_ms, > > > >>> + buffer_squeeze_duration_ms, int, 0644); > > > >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > > >>> +"Duration in ms to squeeze pages buffer when a memory pressure is > > > >>> detected"); > > > >>> + > > > >>> +/* > > > >>> + * Callback received when the memory pressure is detected. > > > >>> + */ > > > >>> +static void reclaim_memory(struct xenbus_device *dev) > > > >>> +{ > > > >>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > > > >>> + > > > >>> + be->blkif->buffer_squeeze_end = jiffies + > > > >>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > > > >> > > > >> This callback might race with 'xen_blkbk_probe()'. The race could > > > >> result in > > > >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > > > >> links > > > >> 'be' to the 'dev'. Please _don't merge_ this patch now! > > > >> > > > >> I will do more test and share results. Meanwhile, if you have any > > > >> opinion, > > > >> please let me know. > > > > I reduced system memory and attached bunch of devices in short time so that > > memory pressure occurs while device attachments are ongoing. Under this > > circumstance, I was able to see the race. > > > > > > > > > > Not only '->blkif', but 'be' itself also coule be a NULL. As similar > > > > concurrency issues could be in other drivers in their way, I suggest to > > > > change > > > > the reclaim callback ('->reclaim_memory') to be called for each driver > > > > instead > > > > of each device. Then, each driver could be able to deal with its > > > > concurrency > > > > issues by itself. > > > > > > Hmm, I don't like that. This would need to be changed back in case we > > > add per-guest quota. > > > > Extending this callback in that way would be still not too hard. We could > > use > > the argument to the callback. I would keep the argument of the callback to > > 'struct device *' as is, and will add a comment saying 'NULL' value of the > > argument means every devices. As an example, xenbus would pass NULL-ending > > array of the device pointers that need to free its resources. > > > > After seeing this race, I am now also thinking it could be better to > > delegate > > detailed control of each device to its driver, as some drivers have some > > complicated and unique relation with its devices. > > > > > > > > Wouldn't a get_device() before calling the callback and a put_device() > > > afterwards avoid that problem? > > > > I didn't used the reference count manipulation operations because other > > similar > > parts also didn't. But, if there is no implicit reference count guarantee, > > it > > seems those operations are indeed necessary. > > > > That said, as get/put operations only adjust the reference count, those will > > not make the callback to wait until the linking of the 'backend' and > > 'blkif' to > > the device (xen_blkbk_probe()) is finished. Thus, the race could still > > happen. > > Or, am I missing something? > > I would expect the device is not added to the list of backend devices > until the probe hook has finished with a non-error return code. Ie: > bus_for_each_dev should _not_ iterate over devices for which the probe > function hasn't been run to competition without errors. > > The same way I would expect the remove hook to first remove the device > from the list of backend devices and then run the remove hook. > > blkback uses an ad-hoc reference counting mechanism, but if the above > assumptions are true I think it would be enough to take an extra > reference in xen_blkbk_probe and drop it in xen_blkbk_remove. Well, if the assumption is true, wouldn't the Juergen's approach solved the problem? As previously said, I tried the approach but failed to solve this race. The assumption is wrong or I missed something. I think Juergen also think the assumption is not true as he suggested use of locking but not sure. Juergen, if I misunderstood, please let me know. Thanks, SeongJae Park > > Additionally it might be interesting to switch the ad-hoc reference > counting to use get_device/put_device (in a separate patch), but I'm > not sure how feasible that is. > > Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.