Xen project Mailing List

Re: slow start of Pod HVM domU with qemu 9.1

On Wed, 29 Jan 2025, Stefano Stabellini wrote: > On Wed, 29 Jan 2025, Edgar E. Iglesias wrote: > > On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > > > Hello, > > > > > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > > > Depending on the domU kernel, it may trigger a warning, which > > > > > prompted me > > > > > to notice this change in behavior: > > > > > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) > > > > > (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 > > > > > (fd9dc36) > > > > > ... > > > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for > > > > > per-cpu timer > > > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > > > [ 16.136086] random: crng init done > > > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 > > > > > nice=0 stuck for 30s! > > > > > [ 31.716029] Showing busy workqueues and worker pools: > > > > > [ 31.721164] workqueue events: flags=0x0 > > > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > > > [ 31.728000] in-flight: 17:balloon_process > > > > > [ 31.728000] pending: hpet_work > > > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > > > [ 31.736000] pending: vmstat_update > > > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s > > > > > workers=2 idle: 34 > > > > > [ 50.400102] clocksource: Switched to clocksource xen > > > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > > > ... > > > > > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 > > > > > seconds. > > > > > > > > > > The change which caused this commit is qemu.git commit > > > > > 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > > > Author: Edgar E. Iglesias <edgar.iglesias@xxxxxxx> > > > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > > > Commit: Edgar E. Iglesias <edgar.iglesias@xxxxxxx> > > > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > > > > > xen: mapcache: Add support for grant mappings > > > > > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > > > This was probably unintentional, but would probably not change the > > > > > result. > > > > > > > > Hi Olaf, > > > > > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > > > > > > > > > With this change the domU starts fast again: > > > > > > > > > > --- a/hw/xen/xen-mapcache.c > > > > > +++ b/hw/xen/xen-mapcache.c > > > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > > > ram_addr_t addr; > > > > > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > > > + if (1) > > > > > if (addr == RAM_ADDR_INVALID) { > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, > > > > > ptr); > > > > > } > > > > > @@ -626,6 +627,7 @@ static void > > > > > xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > > > { > > > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > > > + if (1) > > > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > > > } > > > > > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > > > bdrv_drain_all(); > > > > > > > > > > xen_invalidate_map_cache_single(mapcache); > > > > > + if (0) > > > > > xen_invalidate_map_cache_single(mapcache_grants); > > > > > } > > > > > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like > > > > > this: > > > > > maxmem = 4096 > > > > > memory = 2048 > > > > > maxvcpus = 4 > > > > > vcpus = 2 > > > > > pae = 1 > > > > > acpi = 1 > > > > > apic = 1 > > > > > viridian = 0 > > > > > rtc_timeoffset = 0 > > > > > localtime = 0 > > > > > on_poweroff = "destroy" > > > > > on_reboot = "destroy" > > > > > on_crash = "destroy" > > > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > > > sdl = 0 > > > > > vnc = 1 > > > > > vncunused = 1 > > > > > vnclisten = "127.0.0.1" > > > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > > > parallel = "none" > > > > > serial = "pty" > > > > > builder = "hvm" > > > > > kernel = "/bug1236329/linux" > > > > > ramdisk = "/bug1236329/initrd" > > > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > > > boot = "c" > > > > > disk = [ > > > > > "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" > > > > > ] > > > > > usb = 1 > > > > > usbdevice = "tablet" > > > > > > > > > > Any idea what can be done to restore boot times? > > > > > > > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > > > when invalidating (in QEMU). Despite it being unused and empty. We > > > > could try to find a way to keep track of usage and do nothing when > > > > invalidating an empty/unused cache. > > > > > > If mapcache_grants is unused and empty, the call to > > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > > > Yes, I agree but looking at the invalidation code it looks like if we're > > unconditionally walking all the buckets in the hash-table... > > > > > > > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > > so going through all the mappings in xen_invalidate_map_cache_single > > > takes time. > > > > The reason I don't think it's being used is because we've only enabled > > grants for PVH machines and Olaf runs HVM machines, so QEMU would never > > end up mapping grants for DMA. > > Oh, I see! In that case we could have a trivial check on mc->last_entry > == NULL as fast path, something like: > > if ( mc->last_entry == NULL ) > return; > > at the beginning of xen_invalidate_map_cache_single? > > > > > However, I wonder if it is really needed. At least in the PoD case, the > > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > > memory has changed. But that doesn't affect the grant mappings, because > > > those are mappings of other domains' memory. > > > > > > So I am thinking whether we should remove the call to > > > xen_invalidate_map_cache_single(mapcache_grants) ? > > > > Good point! > > Let's see how the discussion evolves on that point Jan and Juergen clarified that there is no need to call xen_invalidate_map_cache_single for grants on IOREQ_TYPE_INVALIDATE requests. --- xen: no need to flush the mapcache for grants On IOREQ_TYPE_INVALIDATE we need to invalidate the mapcache for regular mappings. Since recently we started reusing the mapcache also to keep track of grants mappings. However, there is no need to remove grant mappings on IOREQ_TYPE_INVALIDATE requests, we shouldn't do that. So remove the function call. Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx> diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 00bfbcc6fb..698b5c53ed 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -700,7 +700,6 @@ void xen_invalidate_map_cache(void) bdrv_drain_all(); xen_invalidate_map_cache_single(mapcache); - xen_invalidate_map_cache_single(mapcache_grants); } static uint8_t *xen_replace_cache_entry_unlocked(MapCache *mc,

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.