Xen project Mailing List

Re: [PATCH] vpci/msix: handle accesses adjacent to the MSI-X table

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 14 Mar 2023 14:54:02 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MyLtwW1bznW9ueRtDO73KTBnRCp9bOfHDj+wizqFG0s=; b=mJ1QQGzSsm9H5AfhdMZjnt/JDnHhxjTP72vQp8aaO0CTNHim4Fh5au2M7nbIDhEYPDsdIDw6Z+6vkuP2GtWemBWNvmUWrhhEf0VjgfmTX6Y2s9KhkxgFE4KpbZWW+f3+UjDgxEG4zEyMWjD8hIJK5doXZ27CfcPt407Habyws3myECHGxPSq66kLdOmnY8iYspz/jqRhDyc4vAW68Xq+9OAGYZ7PP8AQiUNI4XzAyUU3TCZm/aRrYKD3Cy9+H7J2Oz6ngwvLlUgoTWYKu7vjLkzMotDrcp+ABkmXt1wmObIj1jNEv15lSN/2ZzIpQFv3kJ5nXFua+UBxtdBhIudY/Q==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B05cpJgBLR57dZNNQDYPsAcNKtByaVxDZTQPUofKCjgvgLGrTej8ImeeosPCbTuHo2z5k4pCUMp4Ox1TbIAM8jL3lv7aOLlJXPm5d6d0WxeiZ8+8ESe8xvC6GwILmDD/BV7i33w1uAvh2dqhm5xN7CEyjk55mUdNzF2vQlQpYMWjd2q0pr/aDkp6Lc2QMtLw5yewSM6oMVrNS6tmIfWtXB+9Ui4hcmLuO6wIGSgQHToVA28lKcGayno9/c/JGydsA+L6HVW2xkURE+mU10eWaw0KlmCBCtzRVDVdk3ST0Q4AkQo9YAJHl4kXENYCifW4xXNIsi2bloUZVddhhSDcwg==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 14 Mar 2023 13:54:40 +0000

Ironport-data: A9a23:wq4o8att8ym+JPF97D+7hu85q+fnVERfMUV32f8akzHdYApBsoF/q tZmKWvSPfqDNjTyc48lOtu+oB9Q75TSz9cwTwFqrH9gHy0X+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg3HVQ+IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj6Fv0gnRkPaoQ5ASHyyFPZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwLDcQMRSNqtOM4buYU7NNmOIZI83kFdZK0p1g5Wmx4fcOZ7nmGvyPz/kImTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osgP60boq9lt+iHK25mm6Co W3L5SLhCwwyP92D0zuVtHmrg4cjmAuiANpLSefhraUCbFu76XArCw8uc1GA/PS+lX7hcNhVd 0kSw397xUQ13AnxJjXnZDW6qnOZuh8XW/JLDvY3rgqKz8L88wufQ2QJUDNFQNgnr9MtAywn0 EeTmNHkDiApt6eaIU9x7Z+RpDK2fCITfWkLYHZdSRNfuoez5oYukhjIU9BvVravicH4Ei3xx DbMqzUig7IUjogA0KDTEU37vg9Ab6PhFmYdjjg7lEr/hu+lTOZJv7CV1GU=

Ironport-hdrordr: A9a23:Ipl2c6jzThv7/4ZXWuZDodji43BQXtQji2hC6mlwRA09TyX4ra yTdZEgviMc5wx/ZJhNo7690cu7IU80hKQV3WB5B97LNmTbUQCTXeJfBOXZsljdMhy72ulB1b pxN4hSYeeAaWSSVPyKgjWFLw==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Mar 14, 2023 at 12:56:33PM +0100, Jan Beulich wrote: > On 14.03.2023 11:13, Roger Pau Monne wrote: > > --- a/xen/drivers/vpci/msix.c > > +++ b/xen/drivers/vpci/msix.c > > @@ -27,6 +27,13 @@ > > ((addr) >= vmsix_table_addr(vpci, nr) && \ > > (addr) < vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr)) > > > > +#define VMSIX_ADDR_ADJACENT(addr, vpci, nr) \ > > + ((PFN_DOWN(addr) == PFN_DOWN(vmsix_table_addr(vpci, nr)) && \ > > + (addr) < vmsix_table_addr(vpci, nr)) || \ > > + (PFN_DOWN(addr) == PFN_DOWN(vmsix_table_addr(vpci, nr) + \ > > + vmsix_table_size(vpci, nr) - 1) && \ > > + (addr) >= vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr))) > > While I realize this may impact performance a little, did you consider > using !VMSIX_ADDR_IN_RANGE() instead of open-coding it kind of? Then > again there's only a single use of the macro, and that's in code where > VMSIX_ADDR_IN_RANGE() was already checked (by the earlier invocation > of msix_find()), so the re-checking of the MSI-X table bounds isn't > strictly necessary anyway. I didn't want to rely on the order of execution of the MMIO handlers, so I would rather make sure that the newly added one would work correctly if it turns to be checked for before the MSIX table handling one. I could indeed use !VMSIX_ADDR_IN_RANGE() if that is clearer. > > @@ -438,6 +369,145 @@ static const struct hvm_mmio_ops vpci_msix_table_ops > > = { > > .write = msix_write, > > }; > > > > +const static struct vpci_msix *adjacent_find(const struct domain *d, > > + unsigned long addr) > > +{ > > + const struct vpci_msix *msix; > > + > > + list_for_each_entry ( msix, &d->arch.hvm.msix_tables, next ) > > + /* > > + * So far vPCI only traps accesses to the MSIX table, but not the > > PBA > > + * explicitly, and hence we only need to check for the hole > > created by > > + * the MSIX table. > > + * > > + * If the PBA table is also trapped, the check here should be > > expanded > > + * to take it into account. > > + */ > > + if ( VMSIX_ADDR_ADJACENT(addr, msix->pdev->vpci, VPCI_MSIX_TABLE) ) > > + return msix; > > Is the comment really correct when considering that you don't change > vpci_make_msix_hole()? Urg, I was really confused then, as I didn't remember (and didn't check) that we also punch a hole for the PBA. That's not really needed for dom0, as we allow both reads and writes on that case anyway. > (I was actually puzzled by struct vpci_msix'es > table[] field remaining a 2-element array, despite the PBA now being > dealt with differently. But I realize you need to keep that for the > VMSIX_ADDR_IN_RANGE() in adjacent_write().) If we need to handle the PBA I would need to take it into account for the array handling, since the PBA can be in a different set of page(s) than the MSIX table. > > +static int cf_check adjacent_read( > > + struct vcpu *v, unsigned long addr, unsigned int len, unsigned long > > *data) > > +{ > > + const struct domain *d = v->domain; > > + const struct vpci_msix *msix = adjacent_find(d, addr); > > + const void __iomem *mem; > > + paddr_t msix_tbl; > > + struct vpci *vpci; > > + > > + *data = ~0ul; > > + > > + if ( !msix ) > > + return X86EMUL_RETRY; > > + > > + vpci = msix->pdev->vpci; > > + msix_tbl = vmsix_table_addr(vpci, VPCI_MSIX_TABLE); > > + > > + if ( addr + len > round_pgup(msix_tbl + > > + vmsix_table_size(vpci, VPCI_MSIX_TABLE)) ) > > + return X86EMUL_OKAY; > > + > > + mem = get_table(vpci, > > + PFN_DOWN(addr) == PFN_DOWN(msix_tbl) ? > > VPCI_MSIX_TBL_HEAD > > + : > > VPCI_MSIX_TBL_TAIL); > > + if ( !mem ) > > + return X86EMUL_OKAY; > > The respective PBA logic had a gprintk() on this path. Ack, will add one. > > + switch ( len ) > > + { > > + case 1: > > + *data = readb(mem + PAGE_OFFSET(addr)); > > + break; > > + > > + case 2: > > + *data = readw(mem + PAGE_OFFSET(addr)); > > + break; > > + > > + case 4: > > + *data = readl(mem + PAGE_OFFSET(addr)); > > + break; > > + > > + case 8: > > + *data = readq(mem + PAGE_OFFSET(addr)); > > + break; > > So far we have allowed only aligned 4- and 8-byte accesses to the PBA. > Shouldn't we continue to do so? I've read the spec about this and wasn't able to find a reference about having to access the PBA using 4 and 8 byte accesses. There's one for the MSI-X table, but it's my understanding it only applies to the table. > I'm also concerned of misaligned accesses: While we can't keep the > guest from doing such on pages we don't intercept, depending on the kind > of anomalies such may cause the effects there may be contained to that > guest. When doing the accesses from the hypervisor, bad effects could > affect the entire system. (FTAOD I don't mean to constrain guests, but I > do think we need to consider splitting misaligned accesses.) I was also wondering about misaligned accesses. Should be allow dom0 any kind of access, while limiting domUs to aligned only? > > > + default: > > + ASSERT_UNREACHABLE(); > > Is this correct? In msix_{read,write}() these assertions are valid > because of the earlier access_allowed() checks, but here you have > nothing like that. Yes, the emulator currently would only pass sizes > that fit what is being handled, but relying on no "unusual" insns > appearing down the road feels risky. Then again > hvmemul_phys_mmio_access() splits accesses accordingly, so perhaps > all is fine here. It's only on debug builds, so on production ones the access would just be silently dropped without any ill effect on the hypervisor state. > > +static int cf_check adjacent_write( > > + struct vcpu *v, unsigned long addr, unsigned int len, unsigned long > > data) > > +{ > > + const struct domain *d = v->domain; > > + const struct vpci_msix *msix = adjacent_find(d, addr); > > + void __iomem *mem; > > + paddr_t msix_tbl; > > + struct vpci *vpci; > > + > > + if ( !msix ) > > + return X86EMUL_RETRY; > > + > > + vpci = msix->pdev->vpci; > > + msix_tbl = vmsix_table_addr(vpci, VPCI_MSIX_TABLE); > > + > > + if ( addr + len > round_pgup(msix_tbl + > > + vmsix_table_size(vpci, VPCI_MSIX_TABLE)) ) > > + return X86EMUL_OKAY; > > + > > + if ( (VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) || > > + VMSIX_ADDR_IN_RANGE(addr + len - 1, vpci, VPCI_MSIX_PBA)) && > > + !is_hardware_domain(d) ) > > + /* Ignore writes to PBA for DomUs, it's undefined behavior. */ > > + return X86EMUL_OKAY; > > Just as a remark: Checking only start and end is sufficient merely because > the PBA is a multiple of 8 bytes in size, and "len" currently cannot be > larger than 8. This feels somewhat fragile, but is - like above - presumably > okay. Indeed, I was relying on len <= min PBA size. I can add a comment to clarify this. > > @@ -530,7 +600,10 @@ static int cf_check init_msix(struct pci_dev *pdev) > > } > > > > if ( list_empty(&d->arch.hvm.msix_tables) ) > > + { > > register_mmio_handler(d, &vpci_msix_table_ops); > > + register_mmio_handler(d, &vpci_msix_adj_ops); > > + } > > Did you consider re-using the same ops by widening what their accept() > covers, and by having read/write recognize inside vs outside accesses, > dealing with them differently (much like the PBA was dealt with before)? I had the feeling it would be clearer to have the MSIX handler only deal with the MSIX table accesses (and so have it's scope properly limited in the accept hook), and deal with the fallout from having to poke a 4K hole in the physmap using a different handler. I will play a bit with both options and see which one I prefer. > Besides my gut feeling of this ending up being less code, there's also > the aspect of NR_IO_HANDLERS being the upper bound to how many handlers > may be registered. > > Or else did you consider registering this further handler only when > there actually is a device where the MSI-X table has "slack" at the > front and/or end? Hm, no, I didn't consider this conditional registering. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.