[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen/virtio: Avoid use of the dom0 backend in dom0


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 7 Jul 2023 16:42:40 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yflHwoduI6GXqH85ABFTPEmbWuDt3ZiTPT/VoKcCoVY=; b=AUgFasKH5HHGNDh3xkbZACsxXCu+IWAZpfT5H8R/0mRm0vDoHrM12hjgxR7kBArW2hodIyECgNvkXMvzTWq/q4fLK4Rq/pV+5X0hit/Nnzrh9uy4Xr+lBwG+5wSw1245wfy0YMTFnYMi5DFVGOwI0M2by2IPTzdWpbYR3N3QSyt2tT2LcPkluMfcw023xgD26ciSUgjCbFebgzPFH/K7DOH16alcZpj7tUDA7KB9aD/p8zNXai9LH4DZO4aqBxD+lSjVlTaIObkgJXEEheBS/0aWU2ulRnhEH2Lcqwm8MhabW96ZkU7k3GFvc+d3kyn+pxS/Cp94p1Lx14ubSIuEHQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ORqETL8Af23G67UQJ9qz1OE5dhvTV3D8lZfNIauT8FbutI/l226+NtLCDbwl8ATHxyUfbNovWacE/7phLPYqTfZZ9EWPb7080ebmjsUfoVsHxC1knjhv5EjVoymjvToa7IXQinsabMSK/fLfI5RuZ4VWg5Wps2P8965XqfDB4mVZr1YJwV1JJFAOx8iqbE/H0Pdp8owfCeyHH++mZbFhRjAsgKHc9gMJONWmKWpLWINNVGIy5weM9WI6TptCDgLgrVGMt7Y0RUnYtfd/HGYDLNyGPIrxKNzxY7EOIkwChk7YEgv1UyzIMjFsL+mRR6VeBXKVo2ffswqxZpvkp+higA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Petr Pavlu <petr.pavlu@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, vikram.garhwal@xxxxxxx
  • Delivery-date: Fri, 07 Jul 2023 14:43:28 +0000
  • Ironport-data: A9a23:aHZxcqMcx87yyELvrR3tl8FynXyQoLVcMsEvi/4bfWQNrUp21DACx mcfXTjVMv7bNzD9e912PY+xpB4DuZSHzIMxTwto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/vrRC9H5qyo42tH5gNmP5ingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0spQMV9O5 aY0ER4qQQydgeyf0JiZFPY506zPLOGzVG8ekldJ6GiBSNwAHtXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+vJxujCKpOBy+OGF3N79YNuFSN8Thk+Fj mnH4374ElcRM9n3JT+tqyv83rGTwHumMG4UPJam/Pdh2mS3/E0sIx06cWadgeiVk0HrDrqzL GRRoELCt5Ma9kOxScLmdwalu3PCtRkZM/JLCPEz4gyJzqvS4i6aC3ICQzoHb8Yp3OcxQTEpz UOYhNPBCjlmsbnTQnWYnp+RpCm3MDIYLkcDYzEFVgoP59T/oIA1gQnLR9wlG6mw5vXSHTzz2 DmMoDIJu6QIjcUL2qO4+njKmzup4JPOS2Yd/gjLRCS95x19eaa+epelr1Pc6J5oPIufC1WMo nUAs8yf9/wVS4GAkjSXR+cAF63v4OyKWBXYgFhyD9wi+i6r9nqLY49d+nd9KV1vP8JCfiXmC GfNuABL7ZoVM3KwbbB+Z6q4Dshsxq/lfekJTdjRZ9tKJ5J3KwmO+Xg2YVbKhji01k8xjas4J JGXN962CmoXArhmyzzwQPoB1bgsxWY1wma7qY3H8ilLGIG2PBa9IYrp+nPXBgzlxMtoeDnoz us=
  • Ironport-hdrordr: A9a23:KG0OVKye8Ofveg6HyYo3KrPw2r1zdoMgy1knxilNoHxuH/BwWf rPoB17726TtN91YhsdcL+7V5VoLUmzyXcx2/hyAV7AZniAhILLFvAA0WKK+VSJdxEWtNQtsJ uIG5IUNDSaNykfsS+V2miF+9ZL+qj5zEir792usUuEm2tRGtBdBwQSMHfqLqVvLjM2fKbQjP Cnl7d6TzzLQwVuUu2LQkMrcsLkvNPxmJfvcXc9dmIaAFnnt0LS1FbieSLopCsjbw==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jul 07, 2023 at 04:10:14PM +0200, Juergen Gross wrote:
> On 07.07.23 11:50, Roger Pau Monné wrote:
> > On Fri, Jul 07, 2023 at 06:38:48AM +0200, Juergen Gross wrote:
> > > On 06.07.23 23:49, Stefano Stabellini wrote:
> > > > On Thu, 6 Jul 2023, Roger Pau Monné wrote:
> > > > > On Wed, Jul 05, 2023 at 03:41:10PM -0700, Stefano Stabellini wrote:
> > > > > > On Wed, 5 Jul 2023, Roger Pau Monné wrote:
> > > > > > > On Tue, Jul 04, 2023 at 08:14:59PM +0300, Oleksandr Tyshchenko 
> > > > > > > wrote:
> > > > > > > > Part 2 (clarification):
> > > > > > > > 
> > > > > > > > I think using a special config space register in the root 
> > > > > > > > complex would
> > > > > > > > not be terrible in terms of guest changes because it is easy to
> > > > > > > > introduce a new root complex driver in Linux and other OSes. 
> > > > > > > > The root
> > > > > > > > complex would still be ECAM compatible so the regular ECAM 
> > > > > > > > driver would
> > > > > > > > still work. A new driver would only be necessary if you want to 
> > > > > > > > be able
> > > > > > > > to access the special config space register.
> > > > > > > 
> > > > > > > I'm slightly worry of this approach, we end up modifying a root
> > > > > > > complex emulation in order to avoid modifying a PCI device 
> > > > > > > emulation
> > > > > > > on QEMU, not sure that's a good trade off.
> > > > > > > 
> > > > > > > Note also that different architectures will likely have different 
> > > > > > > root
> > > > > > > complex, and so you might need to modify several of them, plus 
> > > > > > > then
> > > > > > > arrange the PCI layout correctly in order to have the proper 
> > > > > > > hierarchy
> > > > > > > so that devices belonging to different driver domains are 
> > > > > > > assigned to
> > > > > > > different bridges.
> > > > > > 
> > > > > > I do think that adding something to the PCI conf register somewhere 
> > > > > > is
> > > > > > the best option because it is not dependent on ACPI and it is not
> > > > > > dependent on xenstore both of which are very undesirable.
> > > > > > 
> > > > > > I am not sure where specifically is the best place. These are 3 
> > > > > > ideas
> > > > > > we came up with:
> > > > > > 1. PCI root complex
> > > > > > 2. a register on the device itself
> > > > > > 3. a new capability of the device
> > > > > > 4. add one extra dummy PCI device for the sole purpose of exposing 
> > > > > > the
> > > > > >      grants capability
> > > > > > 
> > > > > > 
> > > > > > Looking at the spec, there is a way to add a vendor-specific 
> > > > > > capability
> > > > > > (cap_vndr = 0x9). Could we use that? It doesn't look like it is used
> > > > > > today, Linux doesn't parse it.
> > > > > 
> > > > > I did wonder the same from a quick look at the spec.  There's however
> > > > > a text in the specification that says:
> > > > > 
> > > > > "The driver SHOULD NOT use the Vendor data capability except for
> > > > > debugging and reporting purposes."
> > > > > 
> > > > > So we would at least need to change that because the capability would
> > > > > then be used by other purposes different than debugging and reporting.
> > > > > 
> > > > > Seems like a minor adjustment, so might we worth asking upstream about
> > > > > their opinion, and to get a conversation started.
> > > > 
> > > > Wait, wouldn't this use-case fall under "reporting" ? It is exactly what
> > > > we are doing, right?
> > > 
> > > I'd understand "reporting" as e.g. logging, transferring statistics, ...
> > > 
> > > We'd like to use it for configuration purposes.
> > 
> > I've also read it that way.
> > 
> > > Another idea would be to enhance the virtio IOMMU device to suit our 
> > > needs:
> > > we could add the domid as another virtio IOMMU device capability and (for 
> > > now)
> > > use bypass mode for all "productive" devices.
> > 
> > If we have to start adding capabilties, won't it be easier to just add
> > it to the each device instead of adding it to virtio IOMMU.  Or is the
> > parsing of capabilities device specific, and hence we would have to
> > implement such parsing for each device?  I would expect some
> > capabilities are shared between all devices, and a Xen capability could
> > be one of those.
> 
> Have a look at [1], which is describing the common device config layout.
> The problem here is that we'd need to add the domid after the queue specific
> data, resulting in a mess if further queue fields would be added later.
> 
> We could try that, of course.

Right, we must make it part of the standard if we modify
virtio_pci_common_cfg, or else newly added fields would overlap the
Xen specific one.

Would it be possible to signal Xen-grants support in the
`device_feature` field, and then expose it from a vendor capability?
IOW, would it be possible to add a Xen-specific hook in the parsing of
virtio_pci_common_cfg that would then fetch additional data from a
capability?

That would likely be less intrusive than adding a new Xen-specific
field to virtio_pci_common_cfg while still allowing us to do Xen
specific configuration for all VirtIO devices.

> > 
> > > Later we could even add grant-V3 support to Xen and to the virtio IOMMU 
> > > device
> > > (see my last year Xen Summit design session). This could be usable for
> > > disaggregated KVM setups, too, so I believe there is a chance to get this
> > > accepted.
> > > 
> > > > > > > > **********
> > > > > > > > What do you think about it? Are there any pitfalls, etc? This 
> > > > > > > > also requires
> > > > > > > > system changes, but at least without virtio spec changes.
> > > > > > > 
> > > > > > > Why are we so reluctant to add spec changes?  I understand this 
> > > > > > > might
> > > > > > > take time an effort, but it's the only way IMO to build a 
> > > > > > > sustainable
> > > > > > > VirtIO Xen implementation.  Did we already attempt to negotiate 
> > > > > > > with
> > > > > > > Oasis Xen related spec changes and those where refused?
> > > > > > 
> > > > > > That's because spec changes can be very slow. This is a bug that we 
> > > > > > need
> > > > > > a relatively quick solution for and waiting 12-24 months for a spec
> > > > > > update is not realistic.
> > > > > > 
> > > > > > I think a spec change would be best as a long term solution. We also
> > > > > > need a short term solution. The short term solution doesn't have to 
> > > > > > be
> > > > > > ideal but it has to work now.
> > > > > 
> > > > > My fear with such approach is that once a bodge is in place people
> > > > > move on to other stuff and this never gets properly fixed.
> > > > > 
> > > > > I know this might not be a well received opinion, but it would be
> > > > > better if such bodge is kept in each interested party patchqueue for
> > > > > the time being, until a proper solution is implemented.  That way
> > > > > there's an interest from parties into properly fixing it upstream.
> > > > 
> > > > Unfortunately we are in the situation where we have an outstanding
> > > > upstream bug, so we have to take action one way or the other.
> > > 
> > > The required virtio IOMMU device modification would be rather small, so
> > > adding it maybe under a CONFIG option defaulting to off might be
> > > acceptable.
> > 
> > Would you then do the grant allocation as part of virtio IOMMU?
> 
> Long term, maybe. Do you remember my Grant-V3 design session last year? Being
> able to reuse the same layout for virtio IOMMU was one of the basic ideas for
> that layout (this would need some heavy work on the virtio IOMMU frontend and
> backend, of course).

While this might well be the best option, do we have anyone with the
time and expertise to work on this?  I might be wrong, but it seems
like a huge task.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.