[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: slow start of Pod HVM domU with qemu 9.1


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: "Edgar E. Iglesias" <edgar.iglesias@xxxxxxx>
  • Date: Wed, 29 Jan 2025 12:23:46 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dI9yDiljzJ+q6XSv4LqL9UWFdllKHvicSiPtw0OJEDc=; b=IPht/4RheLWbWHWwySgZKFKzPRXcnQ5g+uzLNsb7sjU07rxwKmUQYJpIXKmDDJq5iuTwidk1wTiVoSI+X1cYc8Fl3HCnO3fz2A5Kdwe6/Y9RgOIiO+LJfrSII/xG7jhGkd1yg8vOnyUCAPJn+NLiLx2e8oyleJVJfPMAqnK3dzC9QHJ87KzwKXX6cXkIUx6BekbIZtowL3qTKE+KRrVDpKYlq8za9Gx2SnfUpEK8IqIGh19dxU2Q81MWew8cPrjaFDP+FnHyhiEAeSwiwI5o1gatNgxpExLCYkF2XqtUg5ASOkIsjaOJjo+BaHdCR74UDukGEVHbUt6nTa05VwSvJQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CjeEViTWqyglQUSVBd/x0vRrAGzi7Hf+dhPTvOUybrsB7Em/pctcyDHvCuttkaVGpLLtlE7Nfmk+ocbV5GPglisrbvXJgO89AoaXUR5/IJ8HozJ/Td4gT05/qpvYSSYcnYEjkAEaJwWgbqrJl/30q7c82WW7ThbiNSfSBuKycY5ah16TCq9WdgNK4xmvHANsCy299yKIRmf+MpbkLTESzgUHw5h5JkMbmgbbcqE8JEWsxy316QBOxT9phs659NQbuQz2pz1k6X5M8nnbhYyNFppUaRA9OLKm4fuWCkZrZ3avsARGxBSgnM34WDH42TvQrjqylWsgZo5MsF3iIUDeQQ==
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Olaf Hering <olaf@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <jgross@xxxxxxxx>, Anthony PERARD <anthony@xxxxxxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, <andrew.cooper3@xxxxxxxxxx>, <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 29 Jan 2025 11:24:00 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jan 29, 2025 at 09:52:19AM +0100, Jan Beulich wrote:
> On 29.01.2025 00:58, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote:
> >> On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote:
> >>> With this change the domU starts fast again:
> >>>
> >>> --- a/hw/xen/xen-mapcache.c
> >>> +++ b/hw/xen/xen-mapcache.c
> >>> @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
> >>>      ram_addr_t addr;
> >>>  
> >>>      addr = xen_ram_addr_from_mapcache_single(mapcache, ptr);
> >>> +    if (1)
> >>>      if (addr == RAM_ADDR_INVALID) {
> >>>          addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr);
> >>>      }
> >>> @@ -626,6 +627,7 @@ static void 
> >>> xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer)
> >>>  static void xen_invalidate_map_cache_entry_all(uint8_t *buffer)
> >>>  {
> >>>      xen_invalidate_map_cache_entry_single(mapcache, buffer);
> >>> +    if (1)
> >>>      xen_invalidate_map_cache_entry_single(mapcache_grants, buffer);
> >>>  }
> >>>  
> >>> @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void)
> >>>      bdrv_drain_all();
> >>>  
> >>>      xen_invalidate_map_cache_single(mapcache);
> >>> +    if (0)
> >>>      xen_invalidate_map_cache_single(mapcache_grants);
> >>>  }
> >>>  
> >>> I did the testing with libvirt, the domU.cfg equivalent looks like this:
> >>> maxmem = 4096
> >>> memory = 2048
> >>> maxvcpus = 4
> >>> vcpus = 2
> >>> pae = 1
> >>> acpi = 1
> >>> apic = 1
> >>> viridian = 0
> >>> rtc_timeoffset = 0
> >>> localtime = 0
> >>> on_poweroff = "destroy"
> >>> on_reboot = "destroy"
> >>> on_crash = "destroy"
> >>> device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386"
> >>> sdl = 0
> >>> vnc = 1
> >>> vncunused = 1
> >>> vnclisten = "127.0.0.1"
> >>> vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ]
> >>> parallel = "none"
> >>> serial = "pty"
> >>> builder = "hvm"
> >>> kernel = "/bug1236329/linux"
> >>> ramdisk = "/bug1236329/initrd"
> >>> cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel""
> >>> boot = "c" 
> >>> disk = [ 
> >>> "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2"
> >>>  ]
> >>> usb = 1
> >>> usbdevice = "tablet"
> >>>
> >>> Any idea what can be done to restore boot times?
> >>
> >>
> >> A guess is that it's taking a long time to walk the grants mapcache
> >> when invalidating (in QEMU). Despite it being unused and empty. We
> >> could try to find a way to keep track of usage and do nothing when
> >> invalidating an empty/unused cache.
> > 
> > If mapcache_grants is unused and empty, the call to
> > xen_invalidate_map_cache_single(mapcache_grants) should be really fast?
> > 
> > I think probably it might be the opposite: mapcache_grants is utilized,
> > so going through all the mappings in xen_invalidate_map_cache_single
> > takes time.
> > 
> > However, I wonder if it is really needed. At least in the PoD case, the
> > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU
> > memory has changed. But that doesn't affect the grant mappings, because
> > those are mappings of other domains' memory.
> > 
> > So I am thinking whether we should remove the call to
> > xen_invalidate_map_cache_single(mapcache_grants) ?
> > 
> > Adding x86 maintainers: do we need to flush grant table mappings for the
> > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE
> > request to QEMU?
> 
> Judging from two of the three uses of ioreq_request_mapcache_invalidate()
> in x86'es p2m.c, I'd say no. The 3rd use there is unconditional, but
> maybe wrongly so.
> 
> However, the answer also depends on what qemu does when encountering a
> granted page. Would it enter it into its mapcache? Can it even access it?
> (If it can't, how does emulated I/O work to such pages? If it can, isn't
> this a violation of the grant's permissions, as it's targeted at solely
> the actual HVM domain named in the grant?)
>

QEMU will only map granted pages if the guest explicitly asks QEMU to
DMA into granted pages. Guests first need to grant pages to the domain
running QEMU, then pass a cookie/address to QEMU with the grant id. QEMU
will then map the granted memory, enter it into a dedicated mapcache
(mapcache_grants) and then emulate device DMA to/from the grant.

So QEMU will only map grants intended for QEMU DMA devices, not any grant
to any domain.

Details:
https://github.com/torvalds/linux/blob/master/drivers/xen/grant-dma-ops.c

Cheers,
Edgar



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.