|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
On Mon, Mar 27, 2023 at 04:20:45PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote:
> > On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki
> > > > wrote:
> > > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek
> > > > > > Marczykowski-Górecki wrote:
> > > > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > > >
> > > > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > > > including masking a vector. This does include forwarding also
> > > > > > > writes
> > > > > > > that did not change the value, but as tested on both Linux
> > > > > > > (6.1.12) and
> > > > > > > Windows (10 pro), they don't do excessive writes of unchanged
> > > > > > > values
> > > > > > > (Windows seems to clear maskbit in some cases twice, but not
> > > > > > > more).
> > > > > >
> > > > > > Since we passthrough all the accesses to the device model, is the
> > > > > > handling in Xen still required? It might be worth to also expose
> > > > > > any
> > > > > > interfaces needed to the device model so all the functionality done
> > > > > > by
> > > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > > > passing the accesses anyway.
> > > > >
> > > > > This was discussed on v1 already. Such QEMU would need to be able to
> > > > > do
> > > > > the actual write. If it's running in stubdomain, it would hit the
> > > > > exact
> > > > > issue again (page mapped R/O to it). In fact, that might be an issue
> > > > > for
> > > > > dom0 too (I haven't checked).
> > > >
> > > > Oh, sorry, likely missed that discussion, as I don't recall this.
> > > >
> > > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > > > Xen? As any change on the other fields is already handled by QEMU.
> > > >
> > > > > I guess that could use my subpage RO feature I just posted then, but
> > > > > it
> > > > > would still mean intercepting the write twice (not a performance issue
> > > > > really here, but rather convoluted handling in total).
> > > >
> > > > Yes, that does seem way too convoluted.
> > > >
> > > > > > > Signed-off-by: Marek Marczykowski-Górecki
> > > > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > > > > > ---
> > > > > > > v2:
> > > > > > > - passthrough quad writes to emulator too (Jan)
> > > > > > > - (ab)use len==0 for write len=4 completion (Jan), but add
> > > > > > > descriptive
> > > > > > > #define for this magic value
> > > > > > >
> > > > > > > This behavior change needs to be surfaced to the device model
> > > > > > > somehow,
> > > > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > > >
> > > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> > >
> > > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> > > new field?
> >
> > I think it would be fine to make it IN/OUT, but see below.
> >
> > > > > >
> > > > > > But I wonder whether it shouldn't be the other way arround, the
> > > > > > device
> > > > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > > > QEMU will take care of it, likely using a new flag in
> > > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > > > part of the gflags, but then we would need to assert that the flag
> > > > > > is
> > > > > > passed for all MSI-X interrupts bound from that device to the same
> > > > > > domain.
> > > > >
> > > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > > > MSI-X configuration to assure its safety, especially if no interrupt
> > > > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > > > but both with deprivileged qemu and stubdom, it might matter.
> > > >
> > > > Right - QEMU shouldn't write directly to the MSI-X table using
> > > > /dev/mem, but instead use an hypercall to notify Xen of the
> > > > {un,}masking of the MSI-X table entry. I think that would allow us to
> > > > safely get rid of the extra logic in Xen to deal with MSI-X table
> > > > accesses.
> > >
> > > But the purpose of this series is to give guest (or QEMU) more write
> > > access to the MSI-X page, not less.
> >
> > Right, but there are two independent issues here: one is the
> > propagation of the MSIX entry mask state to the device model, the
> > other is allowing guest accesses to MMIO regions adjacent to the MSIX
> > table.
> >
> > > If it wouldn't be this Intel AX
> > > wifi, indeed we could translate everything to hypercalls in QEMU and not
> > > worry about special handlers in Xen at all. But unfortunately, we do
> > > need to handle writes to the same page outside of the MSI-X structures
> > > and QEMU can't be trusted with properly filtering them (and otherwise
> > > given full write access to the page).
> >
> > Indeed, but IMO it would be helpful if we could avoid this split
> > handling of MSIX entries, where Xen handles entry mask/unmask, and
> > QEMU handles entry setup. It makes the handling logic very
> > complicated, and more likely to be buggy (as you have probably
> > discovered).
> >
> > Having QEMU always handle accesses to the MSI-X table would make
> > things simpler, and we could get rid of a huge amount of logic and
> > entry tracking in msixtbl_mmio_ops.
> >
> > Then, we would only need to detect where an access falls into the same
> > page as the MSI-X (or PBA() tables, but outside of those, and forward
> > it to the underlying hardware, but that's a fairly simple piece of
> > logic, and completely detached from all the MSI-X entry tracking that
> > Xen currently does.
> >
> > > So, while I agree translating {un,}masking individual vectors to
> > > hypercalls could simplify MSI-X handling in general, I don't think it
> > > helps in this particular case. That said, such simplification would
> > > involve:
> > > 1. Exposing the capability in Xen to the qemu
> > > (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> > > 2. QEMU notifying Xen it will handle masking too, somehow.
> >
> > I think it's possible we could get away with adding a new flag bit to
> > xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
> > would tell Xen that QEMU will handle the mask bit for this entry.
>
> Technically, for Xen to not care about those writes, it would need to
> observe this flag on all vectors, including those not mapped yet. In
> practice though, I think it might be okay to say device model should set
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of
> them, or all of them), and Xen can rely on it (if one vector has
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have
> it).
I agree. I would just return -EINVAL if the flag is not consistent
across vectors on the same device.
> > QEMU using this flag should be prepared to handle the mask bit, but if
> > Xen doesn't know the flag it will keep processing the mask bit.
> >
> > > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> > > to update Xen about the mask state too.
> > > 4. Xen no longer interpreting writes to mask bit, but still intercepting
> > > them to passthorugh those outside of MSI-X structures (the other patch
> > > in the series). But the handler would still need to stay, to keep
> > > working with older QEMU versions.
> >
> > Xen would need to intercept writes to the page(s) containing the MSI-X
> > table in any case, but the logic is much simpler if it just needs to
> > decide whether the accesses fall inside of the table region, and thus
> > needs to be forwarded to the device model, or fails outside of it and
> > needs to be propagated to the real address.
> >
> > While true that we won't be able to remove the code that partially
> > handles MSIX entries for guests in Xen, it would be unused for newer
> > versions of QEMU, hopefully making the handling far more consistent as
> > the logic will be entirely in QEMU rather than split between Xen and
> > QEMU.
>
> In fact, it was easier for me to register a separate ioreq server for
> writes outside of the MSI-X table. But I'm afraid the current one would
> need to stay registered (just not accepting writes).
(I assume in the paragraph above you should use hvm_io_handler rather
than ioreq server, as ioreqs are only for emulation running outside of
Xen)
The handle is currently registered when a device with MSIX is assigned
to an HVM domain. I think it's fine to leave as-is as long as the
handler doesn't accept any accesses if QEMU does handle the mask bit.
We can always see later about not registering it.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |