Xen project Mailing List

Re: [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model

To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 3 May 2023 11:01:01 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I9zylJ9wiT91cpmea72jc3/pwcQNSdq+sLu8ekS/W6k=; b=UiqI3sJueayFAY5WTMRufMa7nKI72KuykVgQVzUDZtZFagmAG0/SFLwBL5hGZYbz7GfKiQeFtPdb7jeSx56kjiLU2wrRYTx1Pli1r7TJpoJxtbhsewvMUklqUcBcySqAj6G9tDJ/Eh+e1OmexAoZNS6ME9G9myDF27VAMMPjCEIG6DS5Q1PhzWxU0dHRXCoGaMdgo/nCT+gp97fXRTACYBuPAtwME/CDPPfseaEJ3CAMBu2+KrY+RkdP5QaE69vuEg+3HOfvCZHeEW5i5V7Uc4nA8tiZsTfXJ2hQQefmrUtTnKcGhSL9Q3/fTsBVvN2SPSBS1LkQSGc83Js1Ct6/4Q==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=a04QjF8cUxkfO3hpRqGr6wKBWdSA9N+NibetoKJimr0CJPRKJPnZqafsCkjNvnpdaANPPSrc9Qa1hI501i0eyrXR0eLDRI9IDY6x/AZJasF77hXh0KW63Jr4Kndm2DRHUZw5hWRy8ZAaluXOdf1cifP2MYTTU5jEdEgnzWvlnpPRgUMi3ft7bUqrUrn1X9uuMNpKyv2Ql8y0NE21yU8NpFqqdCGnpGIRdtZfkJgpaLbZBID1OhsLXbN/2p2UYJHJTAEizWEje1PUk7aRVlpu5oGfU+mkDsSJFocrUCTBqstt1vkN+hecn9ZyIUcuNpcxdIqMKJtWb9BNTks9F/B8sQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>

Delivery-date: Wed, 03 May 2023 09:01:27 +0000

Ironport-data: A9a23:jdsSyKBRX+LTXBVW/9Xiw5YqxClBgxIJ4kV8jS/XYbTApGh20WRVx mRNWzyPbveIZTfwe9siaoq39RgD6seDzdNkQQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nOHuGmYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFuspvlDs15K6p4G5A5QRnDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw0+11IFwf+ 6YkNz1Qchuqt8mUzLy8Vbw57igjBJGD0II3nFhFlWucIdN9BJfJTuPN+MNS2yo2ioZWB/HCa sEFaD1pKhPdfxlIPVRRA5U79AuqriCnL3sE9xTI++xrvgA/zyQouFTpGMDSddGQA91cg26Tp 37c/nS/CRYfXDCa4WPdrS302rGQxUsXXqpCMbO31vNBgGSv/W8PKlpGTla4+aeQ3xvWt9V3b hZ8FjAVha077kmiVNT+dxy+vn+fvxQYVsZQEus18wWEwOzf5APxLngJSHtNZcIrsOcyRCc2z RmZktXxHzttvbaJD3WH+d+8rzm/JCwUJm8qfjIfQE0O5NyLiJ43pgLCSJBkCqHdpt/oHTD9x RiaoS54gK8c5eYQzLmy913DhzOqp7DKQxQz6wGRWXiqhithbZOhT5yl7x7c9/koEWqCZlyIv XxBl83F6ukLVcuJjHbVHLhLG6y17fGYNjGamURoA5Qq6zWq/TikYJxU5zZ9YkxuN67oZAPUX aMagisJjLc7AZdgRfIfj16ZYyjy8ZXdKA==

Ironport-hdrordr: A9a23:0BOv2q8laMStvgsvEXBuk+G/dr1zdoMgy1knxilNoENuH/Bwxv rFoB1E73TJYVYqN03IV+rwXZVoZUmsjaKdhrNhRotKPTOWwVdASbsP0WKM+V3d8kHFh41gPO JbAtJD4b7LfCdHZKTBkW6F+r8bqbHokZxAx92uqUuFJTsaF52IhD0JbjpzfHcGJjWvUvECZe ehD4d81nOdkTN9VLXJOlA1G8z44/HbnpPvZhALQzYh9Qm1lDutrJLqDhSC2R8acjVXhZMv63 LMnQDV7riq96jT8G6Q60bjq7Bt3PfxwNpKA8KBzuATNzXXkw6tIKhxRrGYuzgxgee3rHInis PFrRsMN9l6r1nRYma2ix3w3BSI6kdl11bSjXujxVfzq83wQzw3T+JHmIJiaxPcr24tpst13q 5n13+Q88M/N2KKoA3No/zzEz16nEu9pnQv1cYVknxkSIMbLJtct5YW8k95GIoJWAj69IckOu 9zC9y03ocfTXqqK1Ti+kV/yt2lWXo+Wj+AX0g5o8SQlwNbmXhopnFosPA3rzMlztYQWpNE7+ PLPuBDj7dVVPIbaqp7GaMoXda3Inale2OMDEuiZXDcUI0XMXPErJD6pJ8v4vuxRZAOxJwu3L zcTVJjs3IocU6GM7zB4HRyyGGPfIyBZ0Wu9ikHjKIJ/4EUBYCbfhFrcWpe0/dJ+J4kc4nms/ XaAuMiPxasFxqoJW9z5XyPZ3BjEwhhbCQrgKdLZ7uvmLO9FmS4jJ2sTN/jYJzQLB0DZkTTRl M+YRmbHrQz0qnsYA61vCTs

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Apr 06, 2023 at 05:57:23AM +0200, Marek Marczykowski-Górecki wrote: > QEMU needs to know whether clearing maskbit of a vector is really > clearing, or was already cleared before. Currently Xen sends only > clearing that bit to the device model, but not setting it, so QEMU > cannot detect it. Because of that, QEMU is working this around by > checking via /dev/mem, but that isn't the proper approach. It's just a > workaround which in fact is racy. > > Give all necessary information to QEMU by passing all ctrl writes, > including masking a vector. > > While this commit doesn't move the whole maskbit handling to QEMU (as > discussed on xen-devel as one of the possibilities), it is a necessary > first step anyway. Including telling QEMU it will get all the required > information to do so. The actual implementation would need to include: > - a hypercall for QEMU to control just maskbit (without (re)binding the > interrupt again > - a methor for QEMU to tell Xen it will actually do the work > Those are not part of this series. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > --- > v3: > - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make > "flags" parameter IN/OUT > - move len check back to msixtbl_write() - will be needed there anyway > in a later patch > v2: > - passthrough quad writes to emulator too (Jan) > - (ab)use len==0 for write len=4 completion (Jan), but add descriptive > #define for this magic value > > Should flags on output include only "out" values (current version), or > also include those passed in by the caller unchanged? > --- > xen/arch/x86/hvm/vmsi.c | 18 ++++++++++++++---- > xen/common/ioreq.c | 9 +++++++-- > xen/include/public/hvm/dm_op.h | 12 ++++++++---- > 3 files changed, 29 insertions(+), 10 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 3cd4923060c8..231253a2cbd4 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -272,6 +272,15 @@ out: > return r; > } > > +/* > + * This function returns X86EMUL_UNHANDLEABLE even if write is properly > + * handled, to propagate it to the device model (so it can keep its internal > + * state in sync). > + * len==0 means really len==4, but as a write completion that will return > + * X86EMUL_OKAY on successful processing. Use WRITE_LEN4_COMPLETION to make > it > + * less confusing. Isn't it fine to just forward every (valid) write to the dm, and so not introduce WRITE_LEN4_COMPLETION? (see my comment about _msixtbl_write()). > + */ > +#define WRITE_LEN4_COMPLETION 0 > static int msixtbl_write(struct vcpu *v, unsigned long address, > unsigned int len, unsigned long val) > { > @@ -283,8 +292,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long > address, > unsigned long flags; > struct irq_desc *desc; > > - if ( (len != 4 && len != 8) || (address & (len - 1)) ) > - return r; > + if ( (len != 4 && len != 8 && len != WRITE_LEN4_COMPLETION) || > + (len && (address & (len - 1))) ) > + return X86EMUL_UNHANDLEABLE; I think you want to just return X86EMUL_OKAY here, and ignore the access since it's not properly sized or aligned? > > rcu_read_lock(&msixtbl_rcu_lock); > > @@ -345,7 +355,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long > address, > > unlock: > spin_unlock_irqrestore(&desc->lock, flags); > - if ( len == 4 ) > + if ( len == WRITE_LEN4_COMPLETION ) > r = X86EMUL_OKAY; > > out: > @@ -635,7 +645,7 @@ void msix_write_completion(struct vcpu *v) > return; > > v->arch.hvm.hvm_io.msix_unmask_address = 0; > - if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY ) > + if ( msixtbl_write(v, ctrl_address, WRITE_LEN4_COMPLETION, 0) != > X86EMUL_OKAY ) > gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n"); Would it be possible to always return X86EMUL_UNHANDLEABLE from _msixtbl_write() and keep the return values of msixtbl_write() as-is? > } > > diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c > index ecb8f545e1c4..bd6f074c1e85 100644 > --- a/xen/common/ioreq.c > +++ b/xen/common/ioreq.c > @@ -743,7 +743,8 @@ static int ioreq_server_destroy(struct domain *d, > ioservid_t id) > static int ioreq_server_get_info(struct domain *d, ioservid_t id, > unsigned long *ioreq_gfn, > unsigned long *bufioreq_gfn, > - evtchn_port_t *bufioreq_port) > + evtchn_port_t *bufioreq_port, > + uint16_t *flags) > { > struct ioreq_server *s; > int rc; > @@ -779,6 +780,9 @@ static int ioreq_server_get_info(struct domain *d, > ioservid_t id, > *bufioreq_port = s->bufioreq_evtchn; > } > > + /* Advertise supported features/behaviors. */ > + *flags = XEN_DMOP_all_msix_writes; > + > rc = 0; > > out: > @@ -1374,7 +1378,8 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct > domain *d, bool *const_op) > NULL : (unsigned long *)&data->ioreq_gfn, > (data->flags & XEN_DMOP_no_gfns) ? > NULL : (unsigned long > *)&data->bufioreq_gfn, > - &data->bufioreq_port); > + &data->bufioreq_port, &data->flags); > + > break; > } > > diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h > index acdf91693d0b..490b151c5dd7 100644 > --- a/xen/include/public/hvm/dm_op.h > +++ b/xen/include/public/hvm/dm_op.h > @@ -70,7 +70,9 @@ typedef struct xen_dm_op_create_ioreq_server > xen_dm_op_create_ioreq_server_t; > * not contain XEN_DMOP_no_gfns then these pages will be made available and > * the frame numbers passed back in gfns <ioreq_gfn> and <bufioreq_gfn> > * respectively. (If the IOREQ Server is not handling buffered emulation > - * only <ioreq_gfn> will be valid). > + * only <ioreq_gfn> will be valid). When Xen returns XEN_DMOP_all_msix_writes > + * flag set, it will notify the IOREQ server about all writes to MSI-X table > + * (if it's handled by this IOREQ server), not only those clearing a mask > bit. > * > * NOTE: To access the synchronous ioreq structures and buffered ioreq > * ring, it is preferable to use the XENMEM_acquire_resource memory > @@ -81,11 +83,13 @@ typedef struct xen_dm_op_create_ioreq_server > xen_dm_op_create_ioreq_server_t; > struct xen_dm_op_get_ioreq_server_info { > /* IN - server id */ > ioservid_t id; > - /* IN - flags */ > + /* IN/OUT - flags */ > uint16_t flags; > > -#define _XEN_DMOP_no_gfns 0 > -#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns) > +#define _XEN_DMOP_no_gfns 0 /* IN */ > +#define _XEN_DMOP_all_msix_writes 1 /* OUT */ > +#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns) > +#define XEN_DMOP_all_msix_writes (1u << _XEN_DMOP_all_msix_writes) FWIW, we usually interleave _XEN_DMOP_no_gfns and XEN_DMOP_no_gfns, ie: #define _XEN_DMOP_no_gfns 0 /* IN */ #define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns) #define _XEN_DMOP_all_msix_writes 1 /* OUT */ #define XEN_DMOP_all_msix_writes (1u << _XEN_DMOP_all_msix_writes) I wonder whether XEN_DMOP_all_msix_writes should be a feature requested by the dm, as to not change the existing behaviour of how MSIX writes are handled (which might work for QEMU, but could cause issues with other out of tree users of ioreqs)? That would turn XEN_DMOP_all_msix_writes into an IN flag also. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.