[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] x86/msi: Allow writes to registers on the same page as MSI-X table



On Fri, Nov 18, 2022 at 08:20:14AM +0100, Jan Beulich wrote:
> On 17.11.2022 18:31, Marek Marczykowski-Górecki wrote:
> > On Thu, Nov 17, 2022 at 05:34:36PM +0100, Jan Beulich wrote:
> >> On 14.11.2022 20:21, Marek Marczykowski-Górecki wrote:
> >>> --- a/xen/arch/x86/msi.c
> >>> +++ b/xen/arch/x86/msi.c
> >>> @@ -961,6 +961,21 @@ static int msix_capability_init(struct pci_dev *dev,
> >>>                  domain_crash(d);
> >>>              /* XXX How to deal with existing mappings? */
> >>>          }
> >>> +
> >>> +        /*
> >>> +         * If the MSI-X table doesn't span full page(s), map the last 
> >>> page for
> >>> +         * passthrough accesses.
> >>> +         */
> >>> +        if ( (msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
> >>> +        {
> >>> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
> >>> PCI_MSIX_ENTRY_SIZE;
> >>> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> >>> +
> >>> +            if ( idx >= 0 )
> >>> +                msix->last_table_page = fix_to_virt(idx);
> >>> +            else
> >>> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table 
> >>> page: %d\n", idx);
> >>> +        }
> >>
> >> Could we avoid the extra work if there's only less than one page's
> >> worth of entries for a device? But then again maybe not worth any
> >> extra code, as the same mapping will be re-used anyway due to the
> >> refcounting that's being used.
> > 
> > I was considering that, but decided against exactly because of
> > msix_get_fixmap() reusing existing mappings.
> > 
> >> Makes me think of another aspect though: Don't we also need to
> >> handle stuff living on the same page as the start of the table, if
> >> that doesn't start at a page boundary?
> > 
> > I have considered that, but decided against given every single device I
> > tried have MSI-X table at the page boundary. But if you prefer, I can
> > add such handling too (will require adding another variable to the
> > arch_msix structure - to store the fixmap location).
> 
> To limit growth of the struct, please at least consider storing the fixmap
> indexes instead of full pointers.

Ok.

> >>> @@ -1090,6 +1105,12 @@ static void _pci_cleanup_msix(struct arch_msix 
> >>> *msix)
> >>>              WARN();
> >>>          msix->table.first = 0;
> >>>          msix->table.last = 0;
> >>> +        if ( msix->last_table_page )
> >>> +        {
> >>> +            msix_put_fixmap(msix,
> >>> +                            virt_to_fix((unsigned 
> >>> long)msix->last_table_page));
> >>> +            msix->last_table_page = 0;
> >>
> >> To set a pointer please use NULL.
> > 
> > Ok.
> > 
> >> Overall it looks like you're dealing with the issue for HVM only.
> >> You will want to express this in the title, perhaps by using x86/hvm:
> >> as the prefix. But then the question of course is whether this couldn't
> >> be dealt with in/from mmio_ro_emulated_write(), which handles both HVM
> >> and PV. 
> > 
> > The issue is correlating BAR mapping location with guest's view.
> > Writable BAR areas are mapped (by qemu) via xc_domain_memory_mapping(), but
> > that fails for read-only pages (and indeed, qemu doesn't attempt to do
> > that for the pages with the MSI-X table). Lacking that, I need to use
> > msixtbl_entry->gtable, which is HVM-only thing.
> > 
> > In fact there is another corner case I don't handle here: guest
> > accessing those registers when MSI-X is disabled. In that case, there is
> > no related msixtbl_entry, so I can't correlate the access, but the
> > page(s) is still read-only, so direct mapping would fail. In practice,
> > such access will trap into qemu, which will complain "Should not
> > read/write BAR through QEMU". I have seen this happening several times
> > when developing the series (due to bugs in my patches), but I haven't
> > found any case where it would happen with the final patch version.
> > In fact, I have considered handling this whole thing via qemu (as it
> > knows better where BAR live from the guest PoV), but stubdomain still
> > don't have write access to that pages, so that would need to be trapped
> > (for the second time) by Xen anyway.
> > 
> > For the PV case, I think this extra translation wouldn't be necessary as
> > BAR are mapped at their actual location, right?
> 
> I think so, yes.
> 
> > But then, it makes it
> > rather different implementation (separate feature), than just having a
> > common one for PV and HVM.
> 
> It would be different, yes, and if - as you explain above - there are
> technical reasons why it cannot be shared, then so be it. Mentioning
> this in the description may be worthwhile, or else the same question
> may be asked again (even by me, in case I forget part of the discussion
> by the time I look at a particular future version).

Ok, I'll extend the commit message.

> >> Which in turn raises the question: Do you need to handle reads
> >> in the new code in the first place?
> > 
> > The page not being mapped is also the reason why I do need to handle
> > reads too.
> 
> Just for my own clarity: You mean "not mapped to qemu" here?

No, to the HVM domain (in p2m). Xen (outside of MSI-X specific code for
HVM) doesn't know where those reads should be from.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.