[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [qemu-xen master] block: Pass unaligned discard requests to drivers



commit 3482b9bc411a9a12b2efde1018e1ddc906cd817e
Author:     Eric Blake <eblake@xxxxxxxxxx>
AuthorDate: Thu Nov 17 14:13:58 2016 -0600
Commit:     Kevin Wolf <kwolf@xxxxxxxxxx>
CommitDate: Tue Nov 22 15:59:23 2016 +0100

    block: Pass unaligned discard requests to drivers
    
    Discard is advisory, so rounding the requests to alignment
    boundaries is never semantically wrong from the data that
    the guest sees.  But at least the Dell Equallogic iSCSI SANs
    has an interesting property that its advertised discard
    alignment is 15M, yet documents that discarding a sequence
    of 1M slices will eventually result in the 15M page being
    marked as discarded, and it is possible to observe which
    pages have been discarded.
    
    Between commits 9f1963b and b8d0a980, we converted the block
    layer to a byte-based interface that ultimately ignores any
    unaligned head or tail based on the driver's advertised
    discard granularity, which means that qemu 2.7 refuses to
    pass any discard request smaller than 15M down to the Dell
    Equallogic hardware.  This is a slight regression in behavior
    compared to earlier qemu, where a guest executing discards
    in power-of-2 chunks used to be able to get every page
    discarded, but is now left with various pages still allocated
    because the guest requests did not align with the hardware's
    15M pages.
    
    Since the SCSI specification says nothing about a minimum
    discard granularity, and only documents the preferred
    alignment, it is best if the block layer gives the driver
    every bit of information about discard requests, rather than
    rounding it to alignment boundaries early.
    
    Rework the block layer discard algorithm to mirror the write
    zero algorithm: always peel off any unaligned head or tail
    and manage that in isolation, then do the bulk of the request
    on an aligned boundary.  The fallback when the driver returns
    -ENOTSUP for an unaligned request is to silently ignore that
    portion of the discard request; but for devices that can pass
    the partial request all the way down to hardware, this can
    result in the hardware coalescing requests and discarding
    aligned pages after all.
    
    Reported by: Peter Lieven <pl@xxxxxxx>
    CC: qemu-stable@xxxxxxxxxx
    Signed-off-by: Eric Blake <eblake@xxxxxxxxxx>
    Reviewed-by: Max Reitz <mreitz@xxxxxxxxxx>
    Signed-off-by: Kevin Wolf <kwolf@xxxxxxxxxx>
---
 block/io.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 085ac34..4f00562 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2424,7 +2424,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
 {
     BdrvTrackedRequest req;
     int max_pdiscard, ret;
-    int head, align;
+    int head, tail, align;
 
     if (!bs->drv) {
         return -ENOMEDIUM;
@@ -2447,19 +2447,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
         return 0;
     }
 
-    /* Discard is advisory, so ignore any unaligned head or tail */
+    /* Discard is advisory, but some devices track and coalesce
+     * unaligned requests, so we must pass everything down rather than
+     * round here.  Still, most devices will just silently ignore
+     * unaligned requests (by returning -ENOTSUP), so we must fragment
+     * the request accordingly.  */
     align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
     assert(align % bs->bl.request_alignment == 0);
     head = offset % align;
-    if (head) {
-        head = MIN(count, align - head);
-        count -= head;
-        offset += head;
-    }
-    count = QEMU_ALIGN_DOWN(count, align);
-    if (!count) {
-        return 0;
-    }
+    tail = (offset + count) % align;
 
     bdrv_inc_in_flight(bs);
     tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD);
@@ -2471,11 +2467,34 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
 
     max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
                                    align);
-    assert(max_pdiscard);
+    assert(max_pdiscard >= bs->bl.request_alignment);
 
     while (count > 0) {
         int ret;
-        int num = MIN(count, max_pdiscard);
+        int num = count;
+
+        if (head) {
+            /* Make small requests to get to alignment boundaries. */
+            num = MIN(count, align - head);
+            if (!QEMU_IS_ALIGNED(num, bs->bl.request_alignment)) {
+                num %= bs->bl.request_alignment;
+            }
+            head = (head + num) % align;
+            assert(num < max_pdiscard);
+        } else if (tail) {
+            if (num > align) {
+                /* Shorten the request to the last aligned cluster.  */
+                num -= tail;
+            } else if (!QEMU_IS_ALIGNED(tail, bs->bl.request_alignment) &&
+                       tail > bs->bl.request_alignment) {
+                tail %= bs->bl.request_alignment;
+                num -= tail;
+            }
+        }
+        /* limit request size */
+        if (num > max_pdiscard) {
+            num = max_pdiscard;
+        }
 
         if (bs->drv->bdrv_co_pdiscard) {
             ret = bs->drv->bdrv_co_pdiscard(bs, offset, num);
--
generated by git-patchbot for /home/xen/git/qemu-xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.