[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/6] block: add blk_io_plug_call() API



On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote:
> v3
> - Patch 5: Mention why dev_max_batch condition was dropped [Stefano]
> v2
> - Patch 1: "is not be freed" -> "is not freed" [Eric]
> - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
>   [Stefano]
> - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
> - Patch 4: Removed whitespace hunk [Eric]
> 
> The existing blk_io_plug() API is not block layer multi-queue friendly because
> the plug state is per-BlockDriverState.
> 
> Change blk_io_plug()'s implementation so it is thread-local. This is done by
> introducing the blk_io_plug_call() function that block drivers use to batch
> calls while plugged. It is relatively easy to convert block drivers from
> .bdrv_co_io_plug() to blk_io_plug_call().
> 
> Random read 4KB performance with virtio-blk on a host NVMe block device:
> 
> iodepth   iops   change vs today
> 1        45612   -4%
> 2        87967   +2%
> 4       129872   +0%
> 8       171096   -3%
> 16      194508   -4%
> 32      208947   -1%
> 64      217647   +0%
> 128     229629   +0%
> 
> The results are within the noise for these benchmarks. This is to be expected
> because the plugging behavior for a single thread hasn't changed in this patch
> series, only that the state is thread-local now.
> 
> The following graph compares several approaches:
> https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
> - v7.2.0: before most of the multi-queue block layer changes landed.
> - with-blk_io_plug: today's post-8.0.0 QEMU.
> - blk_io_plug-thread-local: this patch series.
> - no-blk_io_plug: what happens when we simply remove plugging?
> - call-after-dispatch: what if we integrate plugging into the event loop? I
>   decided against this approach in the end because it's more likely to
>   introduce performance regressions since I/O submission is deferred until the
>   end of the event loop iteration.
> 
> Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
> others, we see that all plugging approaches are more or less equivalent in 
> this
> benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.
> 
> The Ansible playbook, fio results, and a Jupyter notebook are available here:
> https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug
> 
> Stefan Hajnoczi (6):
>   block: add blk_io_plug_call() API
>   block/nvme: convert to blk_io_plug_call() API
>   block/blkio: convert to blk_io_plug_call() API
>   block/io_uring: convert to blk_io_plug_call() API
>   block/linux-aio: convert to blk_io_plug_call() API
>   block: remove bdrv_co_io_plug() API
> 
>  MAINTAINERS                       |   1 +
>  include/block/block-io.h          |   3 -
>  include/block/block_int-common.h  |  11 ---
>  include/block/raw-aio.h           |  14 ---
>  include/sysemu/block-backend-io.h |  13 +--
>  block/blkio.c                     |  43 ++++----
>  block/block-backend.c             |  22 -----
>  block/file-posix.c                |  38 -------
>  block/io.c                        |  37 -------
>  block/io_uring.c                  |  44 ++++-----
>  block/linux-aio.c                 |  41 +++-----
>  block/nvme.c                      |  44 +++------
>  block/plug.c                      | 159 ++++++++++++++++++++++++++++++
>  hw/block/dataplane/xen-block.c    |   8 +-
>  hw/block/virtio-blk.c             |   4 +-
>  hw/scsi/virtio-scsi.c             |   6 +-
>  block/meson.build                 |   1 +
>  block/trace-events                |   6 +-
>  18 files changed, 239 insertions(+), 256 deletions(-)
>  create mode 100644 block/plug.c
> 
> -- 
> 2.40.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.