[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] x86/msi: Allow writes to registers on the same page as MSI-X table


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 18 Nov 2022 08:20:14 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1mb2x508X+9y63QKlaQN5o+Rq3EAdZFCsk4rNaOj5IE=; b=dmz0tZ2hzgIqi2a/GlY5zAwgJuAlFuBPv1D/Ov2+R2hAXUSrTtepSMCwsDInq0s8U3yL9xRp9M4PBdNIRRjmcctsa2u9+CVfTqHTPbCW+qcI4JlFNHxf6jI0kcF7hK8gfNIofsBeFaCeTc37J1U2lkBLHBUf+c5B2jsrT6pmG24TH8xlGhuVWeISEqi5NYj0PC1+WFJaqu562zgaZFG7mzcLhVmahXmNt+dIEGP289TSZsuRu6D+/SgshIyShzNzy//74Jn22a/z7hdmbPZ7We36FOCMTXN6oWQYmKbk/loTRCp7H7FNhYz2u07C97HRTARfHceUYZzqNlhkRKAhGA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S0gwI8u69DMsmJEz6MQetvhd2WaOUfYcnSPZAsKnGtvbjBrytgiS4zaRO9cCLbyjpDrc7p7KlQQggzGQrckOs/w7g3LnauJTQkNzQh/yUDGJ6PuwRsE9hZWyQS/TEQzvrwMF1GdyQdmg6hJeahrScU/Zg+nyAwXw5w9rioa1LwgVfLSG1X2Bmtq18oBW+Km5IJ6+eAiBqRwIJAWBNYPEnGwDHbheXm9wP15zKJIzDE92TPlYix/HzYxpLm9FzrKFd3KRI99q/RTIgfumQ/G5nIBFKA2juv3fa0aD+yA7dRNk6EC97ildWgI8lROmol+syBbkfGa2Zo4SXM92vMTl9Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 18 Nov 2022 07:20:38 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.11.2022 18:31, Marek Marczykowski-Górecki wrote:
> On Thu, Nov 17, 2022 at 05:34:36PM +0100, Jan Beulich wrote:
>> On 14.11.2022 20:21, Marek Marczykowski-Górecki wrote:
>>> --- a/xen/arch/x86/msi.c
>>> +++ b/xen/arch/x86/msi.c
>>> @@ -961,6 +961,21 @@ static int msix_capability_init(struct pci_dev *dev,
>>>                  domain_crash(d);
>>>              /* XXX How to deal with existing mappings? */
>>>          }
>>> +
>>> +        /*
>>> +         * If the MSI-X table doesn't span full page(s), map the last page 
>>> for
>>> +         * passthrough accesses.
>>> +         */
>>> +        if ( (msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
>>> +        {
>>> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
>>> PCI_MSIX_ENTRY_SIZE;
>>> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
>>> +
>>> +            if ( idx >= 0 )
>>> +                msix->last_table_page = fix_to_virt(idx);
>>> +            else
>>> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: 
>>> %d\n", idx);
>>> +        }
>>
>> Could we avoid the extra work if there's only less than one page's
>> worth of entries for a device? But then again maybe not worth any
>> extra code, as the same mapping will be re-used anyway due to the
>> refcounting that's being used.
> 
> I was considering that, but decided against exactly because of
> msix_get_fixmap() reusing existing mappings.
> 
>> Makes me think of another aspect though: Don't we also need to
>> handle stuff living on the same page as the start of the table, if
>> that doesn't start at a page boundary?
> 
> I have considered that, but decided against given every single device I
> tried have MSI-X table at the page boundary. But if you prefer, I can
> add such handling too (will require adding another variable to the
> arch_msix structure - to store the fixmap location).

To limit growth of the struct, please at least consider storing the fixmap
indexes instead of full pointers.

>>> @@ -1090,6 +1105,12 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>>>              WARN();
>>>          msix->table.first = 0;
>>>          msix->table.last = 0;
>>> +        if ( msix->last_table_page )
>>> +        {
>>> +            msix_put_fixmap(msix,
>>> +                            virt_to_fix((unsigned 
>>> long)msix->last_table_page));
>>> +            msix->last_table_page = 0;
>>
>> To set a pointer please use NULL.
> 
> Ok.
> 
>> Overall it looks like you're dealing with the issue for HVM only.
>> You will want to express this in the title, perhaps by using x86/hvm:
>> as the prefix. But then the question of course is whether this couldn't
>> be dealt with in/from mmio_ro_emulated_write(), which handles both HVM
>> and PV. 
> 
> The issue is correlating BAR mapping location with guest's view.
> Writable BAR areas are mapped (by qemu) via xc_domain_memory_mapping(), but
> that fails for read-only pages (and indeed, qemu doesn't attempt to do
> that for the pages with the MSI-X table). Lacking that, I need to use
> msixtbl_entry->gtable, which is HVM-only thing.
> 
> In fact there is another corner case I don't handle here: guest
> accessing those registers when MSI-X is disabled. In that case, there is
> no related msixtbl_entry, so I can't correlate the access, but the
> page(s) is still read-only, so direct mapping would fail. In practice,
> such access will trap into qemu, which will complain "Should not
> read/write BAR through QEMU". I have seen this happening several times
> when developing the series (due to bugs in my patches), but I haven't
> found any case where it would happen with the final patch version.
> In fact, I have considered handling this whole thing via qemu (as it
> knows better where BAR live from the guest PoV), but stubdomain still
> don't have write access to that pages, so that would need to be trapped
> (for the second time) by Xen anyway.
> 
> For the PV case, I think this extra translation wouldn't be necessary as
> BAR are mapped at their actual location, right?

I think so, yes.

> But then, it makes it
> rather different implementation (separate feature), than just having a
> common one for PV and HVM.

It would be different, yes, and if - as you explain above - there are
technical reasons why it cannot be shared, then so be it. Mentioning
this in the description may be worthwhile, or else the same question
may be asked again (even by me, in case I forget part of the discussion
by the time I look at a particular future version).

>> Which in turn raises the question: Do you need to handle reads
>> in the new code in the first place?
> 
> The page not being mapped is also the reason why I do need to handle
> reads too.

Just for my own clarity: You mean "not mapped to qemu" here?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.