Xen project Mailing List

Re: [PATCH 2/2] x86/msi: Allow writes to registers on the same page as MSI-X table

From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Fri, 18 Nov 2022 13:19:53 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 18 Nov 2022 12:20:10 +0000

Feedback-id: i1568416f:Fastmail

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Nov 18, 2022 at 08:20:14AM +0100, Jan Beulich wrote: > On 17.11.2022 18:31, Marek Marczykowski-Górecki wrote: > > On Thu, Nov 17, 2022 at 05:34:36PM +0100, Jan Beulich wrote: > >> On 14.11.2022 20:21, Marek Marczykowski-Górecki wrote: > >>> --- a/xen/arch/x86/msi.c > >>> +++ b/xen/arch/x86/msi.c > >>> @@ -961,6 +961,21 @@ static int msix_capability_init(struct pci_dev *dev, > >>> domain_crash(d); > >>> /* XXX How to deal with existing mappings? */ > >>> } > >>> + > >>> + /* > >>> + * If the MSI-X table doesn't span full page(s), map the last > >>> page for > >>> + * passthrough accesses. > >>> + */ > >>> + if ( (msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) ) > >>> + { > >>> + uint64_t entry_paddr = table_paddr + msix->nr_entries * > >>> PCI_MSIX_ENTRY_SIZE; > >>> + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr); > >>> + > >>> + if ( idx >= 0 ) > >>> + msix->last_table_page = fix_to_virt(idx); > >>> + else > >>> + gprintk(XENLOG_ERR, "Failed to map last MSI-X table > >>> page: %d\n", idx); > >>> + } > >> > >> Could we avoid the extra work if there's only less than one page's > >> worth of entries for a device? But then again maybe not worth any > >> extra code, as the same mapping will be re-used anyway due to the > >> refcounting that's being used. > > > > I was considering that, but decided against exactly because of > > msix_get_fixmap() reusing existing mappings. > > > >> Makes me think of another aspect though: Don't we also need to > >> handle stuff living on the same page as the start of the table, if > >> that doesn't start at a page boundary? > > > > I have considered that, but decided against given every single device I > > tried have MSI-X table at the page boundary. But if you prefer, I can > > add such handling too (will require adding another variable to the > > arch_msix structure - to store the fixmap location). > > To limit growth of the struct, please at least consider storing the fixmap > indexes instead of full pointers. Ok. > >>> @@ -1090,6 +1105,12 @@ static void _pci_cleanup_msix(struct arch_msix > >>> *msix) > >>> WARN(); > >>> msix->table.first = 0; > >>> msix->table.last = 0; > >>> + if ( msix->last_table_page ) > >>> + { > >>> + msix_put_fixmap(msix, > >>> + virt_to_fix((unsigned > >>> long)msix->last_table_page)); > >>> + msix->last_table_page = 0; > >> > >> To set a pointer please use NULL. > > > > Ok. > > > >> Overall it looks like you're dealing with the issue for HVM only. > >> You will want to express this in the title, perhaps by using x86/hvm: > >> as the prefix. But then the question of course is whether this couldn't > >> be dealt with in/from mmio_ro_emulated_write(), which handles both HVM > >> and PV. > > > > The issue is correlating BAR mapping location with guest's view. > > Writable BAR areas are mapped (by qemu) via xc_domain_memory_mapping(), but > > that fails for read-only pages (and indeed, qemu doesn't attempt to do > > that for the pages with the MSI-X table). Lacking that, I need to use > > msixtbl_entry->gtable, which is HVM-only thing. > > > > In fact there is another corner case I don't handle here: guest > > accessing those registers when MSI-X is disabled. In that case, there is > > no related msixtbl_entry, so I can't correlate the access, but the > > page(s) is still read-only, so direct mapping would fail. In practice, > > such access will trap into qemu, which will complain "Should not > > read/write BAR through QEMU". I have seen this happening several times > > when developing the series (due to bugs in my patches), but I haven't > > found any case where it would happen with the final patch version. > > In fact, I have considered handling this whole thing via qemu (as it > > knows better where BAR live from the guest PoV), but stubdomain still > > don't have write access to that pages, so that would need to be trapped > > (for the second time) by Xen anyway. > > > > For the PV case, I think this extra translation wouldn't be necessary as > > BAR are mapped at their actual location, right? > > I think so, yes. > > > But then, it makes it > > rather different implementation (separate feature), than just having a > > common one for PV and HVM. > > It would be different, yes, and if - as you explain above - there are > technical reasons why it cannot be shared, then so be it. Mentioning > this in the description may be worthwhile, or else the same question > may be asked again (even by me, in case I forget part of the discussion > by the time I look at a particular future version). Ok, I'll extend the commit message. > >> Which in turn raises the question: Do you need to handle reads > >> in the new code in the first place? > > > > The page not being mapped is also the reason why I do need to handle > > reads too. > > Just for my own clarity: You mean "not mapped to qemu" here? No, to the HVM domain (in p2m). Xen (outside of MSI-X specific code for HVM) doesn't know where those reads should be from. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.