[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 27 Mar 2023 16:37:15 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t4U/iOvc4GK2jeuldVJLiDqeVp/B957B7zqLJNBBoqg=; b=jy7bLrHC4j4JQ/FEKDzu8ygSFAp9CqMiwkWHBTIvfNg4o1wP1g9d3oTCmfy459WlxzupI8yfKVikpyln942ojxjHAulEpvyhprMrCmvTKFG9ESoyhTkOOL/3hrxEj+BOfS36SnjjTnyaBBep1DT0hOiHU7pISTT4n0DYYVJoKHs0U2/InAzjztZQ1qKtDE5D0rHT6shgg5u58iKgRTFrlOdog4FFOj8/ns/r8YJbpvQGFsBpm/GZJEZ53qQz4+0tedtVVIVc/mUUcifrvsG2MWE4kQOU07G/jpFXqTMFf/GrsK40sisOwN0muWcxNiuD/9S9IwUdeRCyYfsW/UT6jg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VXX+2MpLgR8jXEyzlexmWpLCznFAcizEKdDBbyta0c6RaU//xTLrgJhTNMroFHYOak8fbFqEaru64BNzyKTVGsXN4G4Xpc/AkrcuQdeuISkr//cy64nTWALGCi+tnlC8k17mcIOfnK6n8xAavdtz8L9hEmoOsbi1Iae94+2IS6R5FjYO5FXanYWM9pN4KFEQors+8hvM08TrELeWfFroSXyoA9x8jQHH9NtZTGNO3YKHrINvajlSIBjEmaZh01zbvCV5mp46/Sn0yEG4/Jy1umgkfzaihibThW4SAqzWZjiJvUK9Of8Hb8Xn0k6reet2CmTE4kvI50ckpy0JQmjD9w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jason Andryuk <jandryuk@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 27 Mar 2023 14:37:54 +0000
  • Ironport-data: A9a23:7mKFhamqTWHmn1LThXqReEro5gxcJ0RdPkR7XQ2eYbSJt1+Wr1Gzt xJNDGqGaKmPNGbxeo9zadiz/UhXsZOHzNZnHARvqilnEiMWpZLJC+rCIxarNUt+DCFhoGFPt JxCN4aafKjYaleG+39B55C49SEUOZmgH+a6U6icfHgqH2eIcQ954Tp7gek1n4V0ttawBgKJq LvartbWfVSowFaYCEpNg064gE4p7aSaVA8w5ARkPqgQ5gCGzhH5MbpETU2PByqgKmVrNrbSq 9brlNmR4m7f9hExPdKp+p6TnpoiG+O60aCm0xK6aoD66vRwjnVaPpUTbZLwXXx/mTSR9+2d/ f0W3XCGpaXFCYWX8AgVe0Ew/yiTpsSq8pefSZS0mZT7I0Er7xIAahihZa07FdRwxwp5PY1B3 eBEBDVQRQuGvdidxrame9VrnJhkAta+aevzulk4pd3YJdAPZMiZBonvvppf1jp2gd1SF/HDY cZfcSBocBnLfxxIPBEQFY46m+CrwHL4dlW0qnrM/fZxvzeVkVI3jOWF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWxXmqAtlOTu3QGvhCrUPKwn0iDjAtZEbk+9m7sRCARehPJ BlBksYphe1onKCxdfH4Vge9qWSJvTYdXcRRCOww7AyRyqvS7B2dD2JCRTlEAPQ2uclzSTE02 1uhm9LyGScpoLCTUWia9LqfsXW1Iyd9BW0LYyMeXBYGy9bmqYA3yBnIS75LEqS4k9n0EjHY2 C2RoW41gLB7pdEP/7W2+xbAmT3Em3TSZgs85wGSUmf+6Ap8PNShf9bwtQWd6utcJoGESFXHp GIDh8WV8OEJC9eKiTCJR+IOWrqu4p5pLQHhvLKmJLF5nxzFxpJpVdkIiN2iDC+F6vo5RAI=
  • Ironport-hdrordr: A9a23:J/K5lqzfGKE3eDWqqvBFKrPw671zdoMgy1knxilNoRw8SL3/qy nOppQmPHrP4wr5N0tApTntAtjkfZq+z+8N3WByB8bbYOCOggLBQ+9fBOPZskbd8kbFh4pgPM lbAs9DIey1IGJWyeDdy2CDf+rIxuPszImYwd3z9TNGayZES49d1C9FKiC9VndbeWB9dPkEPa vZ6cpDqyChangMB/7XOlAOQ/LfodnGj7LKCCR2ZSIa1A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 27, 2023 at 04:20:45PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote:
> > On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki 
> > > > wrote:
> > > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek 
> > > > > > Marczykowski-Górecki wrote:
> > > > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > > > 
> > > > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > > > including masking a vector. This does include forwarding also 
> > > > > > > writes
> > > > > > > that did not change the value, but as tested on both Linux 
> > > > > > > (6.1.12) and
> > > > > > > Windows (10 pro), they don't do excessive writes of unchanged 
> > > > > > > values
> > > > > > > (Windows seems to clear maskbit in some cases twice, but not 
> > > > > > > more).
> > > > > > 
> > > > > > Since we passthrough all the accesses to the device model, is the
> > > > > > handling in Xen still required?  It might be worth to also expose 
> > > > > > any
> > > > > > interfaces needed to the device model so all the functionality done 
> > > > > > by
> > > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > > > passing the accesses anyway.
> > > > > 
> > > > > This was discussed on v1 already. Such QEMU would need to be able to 
> > > > > do
> > > > > the actual write. If it's running in stubdomain, it would hit the 
> > > > > exact
> > > > > issue again (page mapped R/O to it). In fact, that might be an issue 
> > > > > for
> > > > > dom0 too (I haven't checked).
> > > > 
> > > > Oh, sorry, likely missed that discussion, as I don't recall this.
> > > > 
> > > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > > > Xen?  As any change on the other fields is already handled by QEMU.
> > > > 
> > > > > I guess that could use my subpage RO feature I just posted then, but 
> > > > > it
> > > > > would still mean intercepting the write twice (not a performance issue
> > > > > really here, but rather convoluted handling in total).
> > > > 
> > > > Yes, that does seem way too convoluted.
> > > > 
> > > > > > > Signed-off-by: Marek Marczykowski-Górecki 
> > > > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > > > > > ---
> > > > > > > v2:
> > > > > > >  - passthrough quad writes to emulator too (Jan)
> > > > > > >  - (ab)use len==0 for write len=4 completion (Jan), but add 
> > > > > > > descriptive
> > > > > > >    #define for this magic value
> > > > > > > 
> > > > > > > This behavior change needs to be surfaced to the device model 
> > > > > > > somehow,
> > > > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > > > 
> > > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> > > 
> > > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> > > new field?
> > 
> > I think it would be fine to make it IN/OUT, but see below.
> > 
> > > > > > 
> > > > > > But I wonder whether it shouldn't be the other way arround, the 
> > > > > > device
> > > > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > > > QEMU will take care of it, likely using a new flag in
> > > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > > > part of the gflags, but then we would need to assert that the flag 
> > > > > > is
> > > > > > passed for all MSI-X interrupts bound from that device to the same
> > > > > > domain.
> > > > > 
> > > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > > > MSI-X configuration to assure its safety, especially if no interrupt
> > > > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > > > but both with deprivileged qemu and stubdom, it might matter.
> > > > 
> > > > Right - QEMU shouldn't write directly to the MSI-X table using
> > > > /dev/mem, but instead use an hypercall to notify Xen of the
> > > > {un,}masking of the MSI-X table entry.  I think that would allow us to
> > > > safely get rid of the extra logic in Xen to deal with MSI-X table
> > > > accesses.
> > > 
> > > But the purpose of this series is to give guest (or QEMU) more write
> > > access to the MSI-X page, not less.
> > 
> > Right, but there are two independent issues here: one is the
> > propagation of the MSIX entry mask state to the device model, the
> > other is allowing guest accesses to MMIO regions adjacent to the MSIX
> > table.
> > 
> > > If it wouldn't be this Intel AX
> > > wifi, indeed we could translate everything to hypercalls in QEMU and not
> > > worry about special handlers in Xen at all. But unfortunately, we do
> > > need to handle writes to the same page outside of the MSI-X structures
> > > and QEMU can't be trusted with properly filtering them (and otherwise
> > > given full write access to the page).
> > 
> > Indeed, but IMO it would be helpful if we could avoid this split
> > handling of MSIX entries, where Xen handles entry mask/unmask, and
> > QEMU handles entry setup.  It makes the handling logic very
> > complicated, and more likely to be buggy (as you have probably
> > discovered).
> > 
> > Having QEMU always handle accesses to the MSI-X table would make
> > things simpler, and we could get rid of a huge amount of logic and
> > entry tracking in msixtbl_mmio_ops.
> > 
> > Then, we would only need to detect where an access falls into the same
> > page as the MSI-X (or PBA() tables, but outside of those, and forward
> > it to the underlying hardware, but that's a fairly simple piece of
> > logic, and completely detached from all the MSI-X entry tracking that
> > Xen currently does.
> > 
> > > So, while I agree translating {un,}masking individual vectors to
> > > hypercalls could simplify MSI-X handling in general, I don't think it
> > > helps in this particular case. That said, such simplification would
> > > involve:
> > > 1. Exposing the capability in Xen to the qemu
> > > (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> > > 2. QEMU notifying Xen it will handle masking too, somehow.
> > 
> > I think it's possible we could get away with adding a new flag bit to
> > xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
> > would tell Xen that QEMU will handle the mask bit for this entry.
> 
> Technically, for Xen to not care about those writes, it would need to
> observe this flag on all vectors, including those not mapped yet. In
> practice though, I think it might be okay to say device model should set
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of
> them, or all of them), and Xen can rely on it (if one vector has
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have
> it).

I agree.  I would just return -EINVAL if the flag is not consistent
across vectors on the same device.

> > QEMU using this flag should be prepared to handle the mask bit, but if
> > Xen doesn't know the flag it will keep processing the mask bit.
> > 
> > > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> > > to update Xen about the mask state too.
> > > 4. Xen no longer interpreting writes to mask bit, but still intercepting
> > > them to passthorugh those outside of MSI-X structures (the other patch
> > > in the series). But the handler would still need to stay, to keep
> > > working with older QEMU versions.
> > 
> > Xen would need to intercept writes to the page(s) containing the MSI-X
> > table in any case, but the logic is much simpler if it just needs to
> > decide whether the accesses fall inside of the table region, and thus
> > needs to be forwarded to the device model, or fails outside of it and
> > needs to be propagated to the real address.
> > 
> > While true that we won't be able to remove the code that partially
> > handles MSIX entries for guests in Xen, it would be unused for newer
> > versions of QEMU, hopefully making the handling far more consistent as
> > the logic will be entirely in QEMU rather than split between Xen and
> > QEMU.
> 
> In fact, it was easier for me to register a separate ioreq server for
> writes outside of the MSI-X table. But I'm afraid the current one would
> need to stay registered (just not accepting writes).

(I assume in the paragraph above you should use hvm_io_handler rather
than ioreq server, as ioreqs are only for emulation running outside of
Xen)

The handle is currently registered when a device with MSIX is assigned
to an HVM domain.  I think it's fine to leave as-is as long as the
handler doesn't accept any accesses if QEMU does handle the mask bit.
We can always see later about not registering it.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.