[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

To: SeongJae Park <sj38.park@xxxxxxxxx>
From: Jürgen Groß <jgross@xxxxxxxx>
Date: Tue, 17 Dec 2019 07:23:12 +0100
Cc: axboe@xxxxxxxxx, konrad.wilk@xxxxxxxxxx, pdurrant@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, roger.pau@xxxxxxxxxx
Delivery-date: Tue, 17 Dec 2019 06:23:48 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.12.19 20:48, SeongJae Park wrote:

On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:

On 16.12.19 17:15, SeongJae Park wrote:

On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@xxxxxxxxxx> wrote:

On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@xxxxxxxxxx> wrote:

From: SeongJae Park <sjpark@xxxxxxxxx>

[...]

--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
   }

+/* Once a memory pressure is detected, squeeze free page pools for a while. */

+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+               buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+/*
+ * Callback received when the memory pressure is detected.
+ */
+static void reclaim_memory(struct xenbus_device *dev)
+{
+       struct backend_info *be = dev_get_drvdata(&dev->dev);
+
+       be->blkif->buffer_squeeze_end = jiffies +
+               msecs_to_jiffies(buffer_squeeze_duration_ms);


This callback might race with 'xen_blkbk_probe()'.  The race could result in
__NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links
'be' to the 'dev'.  Please _don't merge_ this patch now!

I will do more test and share results.  Meanwhile, if you have any opinion,
please let me know.


I reduced system memory and attached bunch of devices in short time so that
memory pressure occurs while device attachments are ongoing.  Under this
circumstance, I was able to see the race.


Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
concurrency issues could be in other drivers in their way, I suggest to change
the reclaim callback ('->reclaim_memory') to be called for each driver instead
of each device.  Then, each driver could be able to deal with its concurrency
issues by itself.


Hmm, I don't like that. This would need to be changed back in case we
add per-guest quota.


Extending this callback in that way would be still not too hard.  We could use
the argument to the callback.  I would keep the argument of the callback to
'struct device *' as is, and will add a comment saying 'NULL' value of the
argument means every devices.  As an example, xenbus would pass NULL-ending
array of the device pointers that need to free its resources.

After seeing this race, I am now also thinking it could be better to delegate
detailed control of each device to its driver, as some drivers have some
complicated and unique relation with its devices.


Wouldn't a get_device() before calling the callback and a put_device()
afterwards avoid that problem?


I didn't used the reference count manipulation operations because other similar
parts also didn't.  But, if there is no implicit reference count guarantee, it
seems those operations are indeed necessary.

That said, as get/put operations only adjust the reference count, those will
not make the callback to wait until the linking of the 'backend' and 'blkif' to
the device (xen_blkbk_probe()) is finished.  Thus, the race could still happen.
Or, am I missing something?


No, I think we need a xenbus lock per device which will need to be
taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the
callback.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
  - From: SeongJae Park

References:
- Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
  - From: SeongJae Park

Prev by Date: Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM
Next by Date: [Xen-devel] [qemu-mainline test] 144880: regressions - FAIL
Previous by thread: Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
Next by thread: Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.