[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[qemu-xen staging-4.13] file-posix: Handle undetectable alignment



commit f0d3fa265d074a155164d59af12db72986cfe55d
Author:     Nir Soffer <nirsof@xxxxxxxxx>
AuthorDate: Tue Aug 13 21:21:03 2019 +0300
Commit:     Michael Roth <mdroth@xxxxxxxxxxxxxxxxxx>
CommitDate: Tue Nov 12 11:59:58 2019 -0600

    file-posix: Handle undetectable alignment
    
    In some cases buf_align or request_alignment cannot be detected:
    
    1. With Gluster, buf_align cannot be detected since the actual I/O is
       done on Gluster server, and qemu buffer alignment does not matter.
       Since we don't have alignment requirement, buf_align=1 is the best
       value.
    
    2. With local XFS filesystem, buf_align cannot be detected if reading
       from unallocated area. In this we must align the buffer, but we don't
       know what is the correct size. Using the wrong alignment results in
       I/O error.
    
    3. With Gluster backed by XFS, request_alignment cannot be detected if
       reading from unallocated area. In this case we need to use the
       correct alignment, and failing to do so results in I/O errors.
    
    4. With NFS, the server does not use direct I/O, so both buf_align cannot
       be detected. In this case we don't need any alignment so we can use
       buf_align=1 and request_alignment=1.
    
    These cases seems to work when storage sector size is 512 bytes, because
    the current code starts checking align=512. If the check succeeds
    because alignment cannot be detected we use 512. But this does not work
    for storage with 4k sector size.
    
    To determine if we can detect the alignment, we probe first with
    align=1. If probing succeeds, maybe there are no alignment requirement
    (cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
    don't have any way to tell, we treat this as undetectable alignment. If
    probing with align=1 fails with EINVAL, but probing with one of the
    expected alignments succeeds, we know that we found a working alignment.
    
    Practically the alignment requirements are the same for buffer
    alignment, buffer length, and offset in file. So in case we cannot
    detect buf_align, we can use request alignment. If we cannot detect
    request alignment, we can fallback to a safe value. To use this logic,
    we probe first request alignment instead of buf_align.
    
    Here is a table showing the behaviour with current code (the value in
    parenthesis is the optimal value).
    
    Case    Sector    buf_align (opt)   request_alignment (opt)     result
    ======================================================================
    1       512       512   (1)          512   (512)                 OK
    1       4096      512   (1)          4096  (4096)                FAIL
    ----------------------------------------------------------------------
    2       512       512   (512)        512   (512)                 OK
    2       4096      512   (4096)       4096  (4096)                FAIL
    ----------------------------------------------------------------------
    3       512       512   (1)          512   (512)                 OK
    3       4096      512   (1)          512   (4096)                FAIL
    ----------------------------------------------------------------------
    4       512       512   (1)          512   (1)                   OK
    4       4096      512   (1)          512   (1)                   OK
    
    Same cases with this change:
    
    Case    Sector    buf_align (opt)   request_alignment (opt)     result
    ======================================================================
    1       512       512   (1)          512   (512)                 OK
    1       4096      4096  (1)          4096  (4096)                OK
    ----------------------------------------------------------------------
    2       512       512   (512)        512   (512)                 OK
    2       4096      4096  (4096)       4096  (4096)                OK
    ----------------------------------------------------------------------
    3       512       4096  (1)          4096  (512)                 OK
    3       4096      4096  (1)          4096  (4096)                OK
    ----------------------------------------------------------------------
    4       512       4096  (1)          4096  (1)                   OK
    4       4096      4096  (1)          4096  (1)                   OK
    
    I tested that provisioning VMs and copying disks on local XFS and
    Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
    I tested also on XFS, NFS, Gluster with 512 bytes sector size.
    
    [1] https://bugzilla.redhat.com/1737256
    [2] https://bugzilla.redhat.com/1738657
    
    Signed-off-by: Nir Soffer <nsoffer@xxxxxxxxxx>
    Signed-off-by: Kevin Wolf <kwolf@xxxxxxxxxx>
    
    (cherry picked from commit a6b257a08e3d72219f03e461a52152672fec0612)
    
    Signed-off-by: Michael Roth <mdroth@xxxxxxxxxxxxxxxxxx>
---
 block/file-posix.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index c5df61b477..be32dd8c51 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -323,6 +323,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
     BDRVRawState *s = bs->opaque;
     char *buf;
     size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+    size_t alignments[] = {1, 512, 1024, 2048, 4096};
 
     /* For SCSI generic devices the alignment is not really used.
        With buffered I/O, we don't have any restrictions. */
@@ -349,25 +350,38 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
     }
 #endif
 
-    /* If we could not get the sizes so far, we can only guess them */
-    if (!s->buf_align) {
+    /*
+     * If we could not get the sizes so far, we can only guess them. First try
+     * to detect request alignment, since it is more likely to succeed. Then
+     * try to detect buf_align, which cannot be detected in some cases (e.g.
+     * Gluster). If buf_align cannot be detected, we fallback to the value of
+     * request_alignment.
+     */
+
+    if (!bs->bl.request_alignment) {
+        int i;
         size_t align;
-        buf = qemu_memalign(max_align, 2 * max_align);
-        for (align = 512; align <= max_align; align <<= 1) {
-            if (raw_is_io_aligned(fd, buf + align, max_align)) {
-                s->buf_align = align;
+        buf = qemu_memalign(max_align, max_align);
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+            align = alignments[i];
+            if (raw_is_io_aligned(fd, buf, align)) {
+                /* Fallback to safe value. */
+                bs->bl.request_alignment = (align != 1) ? align : max_align;
                 break;
             }
         }
         qemu_vfree(buf);
     }
 
-    if (!bs->bl.request_alignment) {
+    if (!s->buf_align) {
+        int i;
         size_t align;
-        buf = qemu_memalign(s->buf_align, max_align);
-        for (align = 512; align <= max_align; align <<= 1) {
-            if (raw_is_io_aligned(fd, buf, align)) {
-                bs->bl.request_alignment = align;
+        buf = qemu_memalign(max_align, 2 * max_align);
+        for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+            align = alignments[i];
+            if (raw_is_io_aligned(fd, buf + align, max_align)) {
+                /* Fallback to request_aligment. */
+                s->buf_align = (align != 1) ? align : bs->bl.request_alignment;
                 break;
             }
         }
--
generated by git-patchbot for /home/xen/git/qemu-xen.git#staging-4.13



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.