[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 27 Mar 2023 15:29:55 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ND637UJlccBHSdYxpX0jSr37tgN69Fg9Q3DFQKGFO1E=; b=BC5TWEpwsxXleVs3+S48kVQ044piw4vWBXpFZflquh8jg34GUs+L+DrtVTr3RnaSpfKbkttNuYoVwTckDdb/w995KPjwwgfmrujdxwAMM0fAW4l64r9HOmjMRTI6UtkMrxz1NUGB1tIz0buQfM5MGzPXKpC59/lBJBnOmzomBjQuJGHzhwyv7GF+NOr5XcP9xA5B3L0dQmwp1YbqUYwh2UtUVZsV3Cd5gezkLLxrBDaR1LdivfAL262GW0QyaZXDNLcPEWiVVku/mJH3CCarS+OygzLcRmpMwXplc/rA6NQaCo67UVZ8QrYxkSvWO7il6PP1pdKfYgyPq8xKr/RMgA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bZgxtSlE3tcufTxMb+Rzh/k88ETsmiMCX49AjQz47iup6XFIVqN6ivmfiKzWsqeFfUZd78aK6EURTzQSz4cRprovUu5JE1XaMXYgHvROO9NF5IbtOGtIbx7k+G9g6wgt/ullmf8TCIvX/ZQwvoVjzuEWiDomnoPMokpty3SDrQEnXaqmhlkU1WfmQH2m0iUSwTxXvnbGOONB2kQLbpRpHDfKE1qPr+lGvennYUMge9mkQcziZka3YoD1GLlw6qNwN/9DARcBAc+alM/ZlcRbF8q9HyI1jq/fIEn2ahn5ObygcsyR7pSKPBNa8aBgqokY68XAzWPH7g2HB+gxYBxKXg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jason Andryuk <jandryuk@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 27 Mar 2023 13:30:17 +0000
  • Ironport-data: A9a23:wQzXQ64IQJ9CQe/6/mkCWQxRtOXGchMFZxGqfqrLsTDasY5as4F+v moZUGqDOvaDMWf0edtxb9yw80wPupPSz9Q3SgZtqnpnHi5G8cbLO4+Ufxz6V8+wwm8vb2o8t plDNYOQRCwQZiWBzvt4GuG59RGQ7YnRGvynTraCYnsrLeNdYH9JoQp5nOIkiZJfj9G8Agec0 fv/uMSaM1K+s9JOGjt8B5mr9VU+7JwehBtC5gZlPasR4weH/5UoJMl3yZ+ZfiOQrrZ8RoZWd 86bpJml82XQ+QsaC9/Nut4XpWVTH9Y+lSDX4pZnc/DKbipq/0Te4Y5iXBYoUm9Fii3hojxE4 I4lWapc6+seFvakdOw1C3G0GszlVEFM0OevzXOX6aR/w6BaGpdFLjoH4EweZOUlFuhL7W5mp OBEcxwEXy25g7zukZOBU7ljrf4HFZy+VG8fkikIITDxK98DGMqGaYOaoNhS0XE3m9xEGuvYa 4wBcz1zYR/cYhpJfFAKFJY5m+TujX76G9FagAvN+exrvC6OkUooj+WF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWxXmgBdtISOPQGvhC2A2B/lMsVwcqf16h5secrhOjA4J4N BlBksYphe1onKCxdfH4Vge9qWSJvTYdXcRRCOww7AyRyqvS7B2dD2JCRTlEAPQ2uclzSTE02 1uhm9LyGScpoLCTUWia9LqfsXW1Iyd9BW0LYyMeXBYGy9bmqYA3yBnIS75LEqS4k9n0EjHY2 C2RoW41gLB7pdEP/7W2+xbAmT3Em3TSZgs85wGSVGT16Ap8PdShf9bxtgWd6utcJoGESFXHp GIDh8WV8OEJC9eKiTCJR+IOWrqu4p5pLQHhvLKmJLF5nxzFxpJpVdo4DO1WTKuxDvs5RA==
  • Ironport-hdrordr: A9a23:aQbP9KGX7w+jS+oGpLqEwMeALOsnbusQ8zAX/mt6Q3VuA7elfg 6V7Y0mPH7P+U4ssRQb8+xoV5PwJE80maQFg7X5eI3SPzUO21HIEGgB1/qH/9SIIUSXndK1l5 0BT0EUMqyWMbEVt7ed3OB6KbodKRu8nZxASd2w856ld29XV50=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki 
> > > > wrote:
> > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > 
> > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > including masking a vector. This does include forwarding also writes
> > > > > that did not change the value, but as tested on both Linux (6.1.12) 
> > > > > and
> > > > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > > > (Windows seems to clear maskbit in some cases twice, but not more).
> > > > 
> > > > Since we passthrough all the accesses to the device model, is the
> > > > handling in Xen still required?  It might be worth to also expose any
> > > > interfaces needed to the device model so all the functionality done by
> > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > passing the accesses anyway.
> > > 
> > > This was discussed on v1 already. Such QEMU would need to be able to do
> > > the actual write. If it's running in stubdomain, it would hit the exact
> > > issue again (page mapped R/O to it). In fact, that might be an issue for
> > > dom0 too (I haven't checked).
> > 
> > Oh, sorry, likely missed that discussion, as I don't recall this.
> > 
> > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > Xen?  As any change on the other fields is already handled by QEMU.
> > 
> > > I guess that could use my subpage RO feature I just posted then, but it
> > > would still mean intercepting the write twice (not a performance issue
> > > really here, but rather convoluted handling in total).
> > 
> > Yes, that does seem way too convoluted.
> > 
> > > > > Signed-off-by: Marek Marczykowski-Górecki 
> > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > > > ---
> > > > > v2:
> > > > >  - passthrough quad writes to emulator too (Jan)
> > > > >  - (ab)use len==0 for write len=4 completion (Jan), but add 
> > > > > descriptive
> > > > >    #define for this magic value
> > > > > 
> > > > > This behavior change needs to be surfaced to the device model somehow,
> > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > 
> > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> 
> Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> new field?

I think it would be fine to make it IN/OUT, but see below.

> > > > 
> > > > But I wonder whether it shouldn't be the other way arround, the device
> > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > QEMU will take care of it, likely using a new flag in
> > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > part of the gflags, but then we would need to assert that the flag is
> > > > passed for all MSI-X interrupts bound from that device to the same
> > > > domain.
> > > 
> > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > MSI-X configuration to assure its safety, especially if no interrupt
> > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > but both with deprivileged qemu and stubdom, it might matter.
> > 
> > Right - QEMU shouldn't write directly to the MSI-X table using
> > /dev/mem, but instead use an hypercall to notify Xen of the
> > {un,}masking of the MSI-X table entry.  I think that would allow us to
> > safely get rid of the extra logic in Xen to deal with MSI-X table
> > accesses.
> 
> But the purpose of this series is to give guest (or QEMU) more write
> access to the MSI-X page, not less.

Right, but there are two independent issues here: one is the
propagation of the MSIX entry mask state to the device model, the
other is allowing guest accesses to MMIO regions adjacent to the MSIX
table.

> If it wouldn't be this Intel AX
> wifi, indeed we could translate everything to hypercalls in QEMU and not
> worry about special handlers in Xen at all. But unfortunately, we do
> need to handle writes to the same page outside of the MSI-X structures
> and QEMU can't be trusted with properly filtering them (and otherwise
> given full write access to the page).

Indeed, but IMO it would be helpful if we could avoid this split
handling of MSIX entries, where Xen handles entry mask/unmask, and
QEMU handles entry setup.  It makes the handling logic very
complicated, and more likely to be buggy (as you have probably
discovered).

Having QEMU always handle accesses to the MSI-X table would make
things simpler, and we could get rid of a huge amount of logic and
entry tracking in msixtbl_mmio_ops.

Then, we would only need to detect where an access falls into the same
page as the MSI-X (or PBA() tables, but outside of those, and forward
it to the underlying hardware, but that's a fairly simple piece of
logic, and completely detached from all the MSI-X entry tracking that
Xen currently does.

> So, while I agree translating {un,}masking individual vectors to
> hypercalls could simplify MSI-X handling in general, I don't think it
> helps in this particular case. That said, such simplification would
> involve:
> 1. Exposing the capability in Xen to the qemu
> (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> 2. QEMU notifying Xen it will handle masking too, somehow.

I think it's possible we could get away with adding a new flag bit to
xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
would tell Xen that QEMU will handle the mask bit for this entry.

QEMU using this flag should be prepared to handle the mask bit, but if
Xen doesn't know the flag it will keep processing the mask bit.

> 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> to update Xen about the mask state too.
> 4. Xen no longer interpreting writes to mask bit, but still intercepting
> them to passthorugh those outside of MSI-X structures (the other patch
> in the series). But the handler would still need to stay, to keep
> working with older QEMU versions.

Xen would need to intercept writes to the page(s) containing the MSI-X
table in any case, but the logic is much simpler if it just needs to
decide whether the accesses fall inside of the table region, and thus
needs to be forwarded to the device model, or fails outside of it and
needs to be propagated to the real address.

While true that we won't be able to remove the code that partially
handles MSIX entries for guests in Xen, it would be unused for newer
versions of QEMU, hopefully making the handling far more consistent as
the logic will be entirely in QEMU rather than split between Xen and
QEMU.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.