[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
On Mon, Mar 27, 2023 at 04:20:45PM +0200, Marek Marczykowski-Górecki wrote: > On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote: > > On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote: > > > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote: > > > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki > > > > wrote: > > > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote: > > > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek > > > > > > Marczykowski-Górecki wrote: > > > > > > > QEMU needs to know whether clearing maskbit of a vector is really > > > > > > > clearing, or was already cleared before. Currently Xen sends only > > > > > > > clearing that bit to the device model, but not setting it, so QEMU > > > > > > > cannot detect it. Because of that, QEMU is working this around by > > > > > > > checking via /dev/mem, but that isn't the proper approach. > > > > > > > > > > > > > > Give all necessary information to QEMU by passing all ctrl writes, > > > > > > > including masking a vector. This does include forwarding also > > > > > > > writes > > > > > > > that did not change the value, but as tested on both Linux > > > > > > > (6.1.12) and > > > > > > > Windows (10 pro), they don't do excessive writes of unchanged > > > > > > > values > > > > > > > (Windows seems to clear maskbit in some cases twice, but not > > > > > > > more). > > > > > > > > > > > > Since we passthrough all the accesses to the device model, is the > > > > > > handling in Xen still required? It might be worth to also expose > > > > > > any > > > > > > interfaces needed to the device model so all the functionality done > > > > > > by > > > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up > > > > > > passing the accesses anyway. > > > > > > > > > > This was discussed on v1 already. Such QEMU would need to be able to > > > > > do > > > > > the actual write. If it's running in stubdomain, it would hit the > > > > > exact > > > > > issue again (page mapped R/O to it). In fact, that might be an issue > > > > > for > > > > > dom0 too (I haven't checked). > > > > > > > > Oh, sorry, likely missed that discussion, as I don't recall this. > > > > > > > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to > > > > Xen? As any change on the other fields is already handled by QEMU. > > > > > > > > > I guess that could use my subpage RO feature I just posted then, but > > > > > it > > > > > would still mean intercepting the write twice (not a performance issue > > > > > really here, but rather convoluted handling in total). > > > > > > > > Yes, that does seem way too convoluted. > > > > > > > > > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > > > > > > --- > > > > > > > v2: > > > > > > > - passthrough quad writes to emulator too (Jan) > > > > > > > - (ab)use len==0 for write len=4 completion (Jan), but add > > > > > > > descriptive > > > > > > > #define for this magic value > > > > > > > > > > > > > > This behavior change needs to be surfaced to the device model > > > > > > > somehow, > > > > > > > so it knows whether it can rely on it. I'm open for suggestions. > > > > > > > > > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info? > > > > > > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce > > > new field? > > > > I think it would be fine to make it IN/OUT, but see below. > > > > > > > > > > > > > > But I wonder whether it shouldn't be the other way arround, the > > > > > > device > > > > > > model tells Xen it doesn't need to handle any MSI-X accesses because > > > > > > QEMU will take care of it, likely using a new flag in > > > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as > > > > > > part of the gflags, but then we would need to assert that the flag > > > > > > is > > > > > > passed for all MSI-X interrupts bound from that device to the same > > > > > > domain. > > > > > > > > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to > > > > > MSI-X configuration to assure its safety, especially if no interrupt > > > > > remapping is there? It probably doesn't matter for qemu in dom0 case, > > > > > but both with deprivileged qemu and stubdom, it might matter. > > > > > > > > Right - QEMU shouldn't write directly to the MSI-X table using > > > > /dev/mem, but instead use an hypercall to notify Xen of the > > > > {un,}masking of the MSI-X table entry. I think that would allow us to > > > > safely get rid of the extra logic in Xen to deal with MSI-X table > > > > accesses. > > > > > > But the purpose of this series is to give guest (or QEMU) more write > > > access to the MSI-X page, not less. > > > > Right, but there are two independent issues here: one is the > > propagation of the MSIX entry mask state to the device model, the > > other is allowing guest accesses to MMIO regions adjacent to the MSIX > > table. > > > > > If it wouldn't be this Intel AX > > > wifi, indeed we could translate everything to hypercalls in QEMU and not > > > worry about special handlers in Xen at all. But unfortunately, we do > > > need to handle writes to the same page outside of the MSI-X structures > > > and QEMU can't be trusted with properly filtering them (and otherwise > > > given full write access to the page). > > > > Indeed, but IMO it would be helpful if we could avoid this split > > handling of MSIX entries, where Xen handles entry mask/unmask, and > > QEMU handles entry setup. It makes the handling logic very > > complicated, and more likely to be buggy (as you have probably > > discovered). > > > > Having QEMU always handle accesses to the MSI-X table would make > > things simpler, and we could get rid of a huge amount of logic and > > entry tracking in msixtbl_mmio_ops. > > > > Then, we would only need to detect where an access falls into the same > > page as the MSI-X (or PBA() tables, but outside of those, and forward > > it to the underlying hardware, but that's a fairly simple piece of > > logic, and completely detached from all the MSI-X entry tracking that > > Xen currently does. > > > > > So, while I agree translating {un,}masking individual vectors to > > > hypercalls could simplify MSI-X handling in general, I don't think it > > > helps in this particular case. That said, such simplification would > > > involve: > > > 1. Exposing the capability in Xen to the qemu > > > (XEN_DMOP_get_ioreq_server_info sounds reasonable). > > > 2. QEMU notifying Xen it will handle masking too, somehow. > > > > I think it's possible we could get away with adding a new flag bit to > > xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that > > would tell Xen that QEMU will handle the mask bit for this entry. > > Technically, for Xen to not care about those writes, it would need to > observe this flag on all vectors, including those not mapped yet. In > practice though, I think it might be okay to say device model should set > XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of > them, or all of them), and Xen can rely on it (if one vector has > XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have > it). I agree. I would just return -EINVAL if the flag is not consistent across vectors on the same device. > > QEMU using this flag should be prepared to handle the mask bit, but if > > Xen doesn't know the flag it will keep processing the mask bit. > > > > > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED > > > to update Xen about the mask state too. > > > 4. Xen no longer interpreting writes to mask bit, but still intercepting > > > them to passthorugh those outside of MSI-X structures (the other patch > > > in the series). But the handler would still need to stay, to keep > > > working with older QEMU versions. > > > > Xen would need to intercept writes to the page(s) containing the MSI-X > > table in any case, but the logic is much simpler if it just needs to > > decide whether the accesses fall inside of the table region, and thus > > needs to be forwarded to the device model, or fails outside of it and > > needs to be propagated to the real address. > > > > While true that we won't be able to remove the code that partially > > handles MSIX entries for guests in Xen, it would be unused for newer > > versions of QEMU, hopefully making the handling far more consistent as > > the logic will be entirely in QEMU rather than split between Xen and > > QEMU. > > In fact, it was easier for me to register a separate ioreq server for > writes outside of the MSI-X table. But I'm afraid the current one would > need to stay registered (just not accepting writes). (I assume in the paragraph above you should use hvm_io_handler rather than ioreq server, as ioreqs are only for emulation running outside of Xen) The handle is currently registered when a device with MSIX is assigned to an HVM domain. I think it's fine to leave as-is as long as the handler doesn't accept any accesses if QEMU does handle the mask bit. We can always see later about not registering it. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |