Xen project Mailing List

Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model

To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 27 Mar 2023 16:37:15 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t4U/iOvc4GK2jeuldVJLiDqeVp/B957B7zqLJNBBoqg=; b=jy7bLrHC4j4JQ/FEKDzu8ygSFAp9CqMiwkWHBTIvfNg4o1wP1g9d3oTCmfy459WlxzupI8yfKVikpyln942ojxjHAulEpvyhprMrCmvTKFG9ESoyhTkOOL/3hrxEj+BOfS36SnjjTnyaBBep1DT0hOiHU7pISTT4n0DYYVJoKHs0U2/InAzjztZQ1qKtDE5D0rHT6shgg5u58iKgRTFrlOdog4FFOj8/ns/r8YJbpvQGFsBpm/GZJEZ53qQz4+0tedtVVIVc/mUUcifrvsG2MWE4kQOU07G/jpFXqTMFf/GrsK40sisOwN0muWcxNiuD/9S9IwUdeRCyYfsW/UT6jg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VXX+2MpLgR8jXEyzlexmWpLCznFAcizEKdDBbyta0c6RaU//xTLrgJhTNMroFHYOak8fbFqEaru64BNzyKTVGsXN4G4Xpc/AkrcuQdeuISkr//cy64nTWALGCi+tnlC8k17mcIOfnK6n8xAavdtz8L9hEmoOsbi1Iae94+2IS6R5FjYO5FXanYWM9pN4KFEQors+8hvM08TrELeWfFroSXyoA9x8jQHH9NtZTGNO3YKHrINvajlSIBjEmaZh01zbvCV5mp46/Sn0yEG4/Jy1umgkfzaihibThW4SAqzWZjiJvUK9Of8Hb8Xn0k6reet2CmTE4kvI50ckpy0JQmjD9w==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jason Andryuk <jandryuk@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Mon, 27 Mar 2023 14:37:54 +0000

Ironport-data: A9a23:7mKFhamqTWHmn1LThXqReEro5gxcJ0RdPkR7XQ2eYbSJt1+Wr1Gzt xJNDGqGaKmPNGbxeo9zadiz/UhXsZOHzNZnHARvqilnEiMWpZLJC+rCIxarNUt+DCFhoGFPt JxCN4aafKjYaleG+39B55C49SEUOZmgH+a6U6icfHgqH2eIcQ954Tp7gek1n4V0ttawBgKJq LvartbWfVSowFaYCEpNg064gE4p7aSaVA8w5ARkPqgQ5gCGzhH5MbpETU2PByqgKmVrNrbSq 9brlNmR4m7f9hExPdKp+p6TnpoiG+O60aCm0xK6aoD66vRwjnVaPpUTbZLwXXx/mTSR9+2d/ f0W3XCGpaXFCYWX8AgVe0Ew/yiTpsSq8pefSZS0mZT7I0Er7xIAahihZa07FdRwxwp5PY1B3 eBEBDVQRQuGvdidxrame9VrnJhkAta+aevzulk4pd3YJdAPZMiZBonvvppf1jp2gd1SF/HDY cZfcSBocBnLfxxIPBEQFY46m+CrwHL4dlW0qnrM/fZxvzeVkVI3jOWF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWxXmqAtlOTu3QGvhCrUPKwn0iDjAtZEbk+9m7sRCARehPJ BlBksYphe1onKCxdfH4Vge9qWSJvTYdXcRRCOww7AyRyqvS7B2dD2JCRTlEAPQ2uclzSTE02 1uhm9LyGScpoLCTUWia9LqfsXW1Iyd9BW0LYyMeXBYGy9bmqYA3yBnIS75LEqS4k9n0EjHY2 C2RoW41gLB7pdEP/7W2+xbAmT3Em3TSZgs85wGSUmf+6Ap8PNShf9bwtQWd6utcJoGESFXHp GIDh8WV8OEJC9eKiTCJR+IOWrqu4p5pLQHhvLKmJLF5nxzFxpJpVdkIiN2iDC+F6vo5RAI=

Ironport-hdrordr: A9a23:J/K5lqzfGKE3eDWqqvBFKrPw671zdoMgy1knxilNoRw8SL3/qy nOppQmPHrP4wr5N0tApTntAtjkfZq+z+8N3WByB8bbYOCOggLBQ+9fBOPZskbd8kbFh4pgPM lbAs9DIey1IGJWyeDdy2CDf+rIxuPszImYwd3z9TNGayZES49d1C9FKiC9VndbeWB9dPkEPa vZ6cpDqyChangMB/7XOlAOQ/LfodnGj7LKCCR2ZSIa1A==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 27, 2023 at 04:20:45PM +0200, Marek Marczykowski-Górecki wrote: > On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote: > > On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote: > > > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote: > > > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki > > > > wrote: > > > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote: > > > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek > > > > > > Marczykowski-Górecki wrote: > > > > > > > QEMU needs to know whether clearing maskbit of a vector is really > > > > > > > clearing, or was already cleared before. Currently Xen sends only > > > > > > > clearing that bit to the device model, but not setting it, so QEMU > > > > > > > cannot detect it. Because of that, QEMU is working this around by > > > > > > > checking via /dev/mem, but that isn't the proper approach. > > > > > > > > > > > > > > Give all necessary information to QEMU by passing all ctrl writes, > > > > > > > including masking a vector. This does include forwarding also > > > > > > > writes > > > > > > > that did not change the value, but as tested on both Linux > > > > > > > (6.1.12) and > > > > > > > Windows (10 pro), they don't do excessive writes of unchanged > > > > > > > values > > > > > > > (Windows seems to clear maskbit in some cases twice, but not > > > > > > > more). > > > > > > > > > > > > Since we passthrough all the accesses to the device model, is the > > > > > > handling in Xen still required? It might be worth to also expose > > > > > > any > > > > > > interfaces needed to the device model so all the functionality done > > > > > > by > > > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up > > > > > > passing the accesses anyway. > > > > > > > > > > This was discussed on v1 already. Such QEMU would need to be able to > > > > > do > > > > > the actual write. If it's running in stubdomain, it would hit the > > > > > exact > > > > > issue again (page mapped R/O to it). In fact, that might be an issue > > > > > for > > > > > dom0 too (I haven't checked). > > > > > > > > Oh, sorry, likely missed that discussion, as I don't recall this. > > > > > > > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to > > > > Xen? As any change on the other fields is already handled by QEMU. > > > > > > > > > I guess that could use my subpage RO feature I just posted then, but > > > > > it > > > > > would still mean intercepting the write twice (not a performance issue > > > > > really here, but rather convoluted handling in total). > > > > > > > > Yes, that does seem way too convoluted. > > > > > > > > > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > > > > > > --- > > > > > > > v2: > > > > > > > - passthrough quad writes to emulator too (Jan) > > > > > > > - (ab)use len==0 for write len=4 completion (Jan), but add > > > > > > > descriptive > > > > > > > #define for this magic value > > > > > > > > > > > > > > This behavior change needs to be surfaced to the device model > > > > > > > somehow, > > > > > > > so it knows whether it can rely on it. I'm open for suggestions. > > > > > > > > > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info? > > > > > > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce > > > new field? > > > > I think it would be fine to make it IN/OUT, but see below. > > > > > > > > > > > > > > But I wonder whether it shouldn't be the other way arround, the > > > > > > device > > > > > > model tells Xen it doesn't need to handle any MSI-X accesses because > > > > > > QEMU will take care of it, likely using a new flag in > > > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as > > > > > > part of the gflags, but then we would need to assert that the flag > > > > > > is > > > > > > passed for all MSI-X interrupts bound from that device to the same > > > > > > domain. > > > > > > > > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to > > > > > MSI-X configuration to assure its safety, especially if no interrupt > > > > > remapping is there? It probably doesn't matter for qemu in dom0 case, > > > > > but both with deprivileged qemu and stubdom, it might matter. > > > > > > > > Right - QEMU shouldn't write directly to the MSI-X table using > > > > /dev/mem, but instead use an hypercall to notify Xen of the > > > > {un,}masking of the MSI-X table entry. I think that would allow us to > > > > safely get rid of the extra logic in Xen to deal with MSI-X table > > > > accesses. > > > > > > But the purpose of this series is to give guest (or QEMU) more write > > > access to the MSI-X page, not less. > > > > Right, but there are two independent issues here: one is the > > propagation of the MSIX entry mask state to the device model, the > > other is allowing guest accesses to MMIO regions adjacent to the MSIX > > table. > > > > > If it wouldn't be this Intel AX > > > wifi, indeed we could translate everything to hypercalls in QEMU and not > > > worry about special handlers in Xen at all. But unfortunately, we do > > > need to handle writes to the same page outside of the MSI-X structures > > > and QEMU can't be trusted with properly filtering them (and otherwise > > > given full write access to the page). > > > > Indeed, but IMO it would be helpful if we could avoid this split > > handling of MSIX entries, where Xen handles entry mask/unmask, and > > QEMU handles entry setup. It makes the handling logic very > > complicated, and more likely to be buggy (as you have probably > > discovered). > > > > Having QEMU always handle accesses to the MSI-X table would make > > things simpler, and we could get rid of a huge amount of logic and > > entry tracking in msixtbl_mmio_ops. > > > > Then, we would only need to detect where an access falls into the same > > page as the MSI-X (or PBA() tables, but outside of those, and forward > > it to the underlying hardware, but that's a fairly simple piece of > > logic, and completely detached from all the MSI-X entry tracking that > > Xen currently does. > > > > > So, while I agree translating {un,}masking individual vectors to > > > hypercalls could simplify MSI-X handling in general, I don't think it > > > helps in this particular case. That said, such simplification would > > > involve: > > > 1. Exposing the capability in Xen to the qemu > > > (XEN_DMOP_get_ioreq_server_info sounds reasonable). > > > 2. QEMU notifying Xen it will handle masking too, somehow. > > > > I think it's possible we could get away with adding a new flag bit to > > xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that > > would tell Xen that QEMU will handle the mask bit for this entry. > > Technically, for Xen to not care about those writes, it would need to > observe this flag on all vectors, including those not mapped yet. In > practice though, I think it might be okay to say device model should set > XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of > them, or all of them), and Xen can rely on it (if one vector has > XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have > it). I agree. I would just return -EINVAL if the flag is not consistent across vectors on the same device. > > QEMU using this flag should be prepared to handle the mask bit, but if > > Xen doesn't know the flag it will keep processing the mask bit. > > > > > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED > > > to update Xen about the mask state too. > > > 4. Xen no longer interpreting writes to mask bit, but still intercepting > > > them to passthorugh those outside of MSI-X structures (the other patch > > > in the series). But the handler would still need to stay, to keep > > > working with older QEMU versions. > > > > Xen would need to intercept writes to the page(s) containing the MSI-X > > table in any case, but the logic is much simpler if it just needs to > > decide whether the accesses fall inside of the table region, and thus > > needs to be forwarded to the device model, or fails outside of it and > > needs to be propagated to the real address. > > > > While true that we won't be able to remove the code that partially > > handles MSIX entries for guests in Xen, it would be unused for newer > > versions of QEMU, hopefully making the handling far more consistent as > > the logic will be entirely in QEMU rather than split between Xen and > > QEMU. > > In fact, it was easier for me to register a separate ioreq server for > writes outside of the MSI-X table. But I'm afraid the current one would > need to stay registered (just not accepting writes). (I assume in the paragraph above you should use hvm_io_handler rather than ioreq server, as ioreqs are only for emulation running outside of Xen) The handle is currently registered when a device with MSIX is assigned to an HVM domain. I think it's fine to leave as-is as long as the handler doesn't accept any accesses if QEMU does handle the mask bit. We can always see later about not registering it. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.