[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/11] xen/arm: Do not map PCI ECAM space to Domain-0's p2m


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 16 Sep 2021 07:16:32 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4kxncR3dmPW+A/TwUGeqPoNHqXq87uFi9iCENuBJwxc=; b=fTToUN3PFN6bbv8hWe6NK8PlMeLq+VputtoqNIAijFVekpm7k/JPx/CXqG3MkpMF+EhluX0szAbTzTZCJufKhepCcae0otWCQ42IeeDEFc+QfUedyiSC/mUNuFf8RY0i4WSWvbOUHSpK9FRaW9gyr3mZDzrPSUj6zp2brcwVK7vGmMZAvigXtrfghC8dnhNTh3jZjpD29qmQ9wScfne9e0/Oz2lA22Xq6+MZoWhXEuFzQoQSUCD2A7LivcNo8Iz7rdz3tUybQq3vd7IwF3rL7YugB3hhnX+9Tvqlun+oCCjPB66IYbOBqIaey1E1HluKQxxhpfBCCTVfL3CWYmeXVQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g1WEHDviZEbOXqq5p7vFFSOr2tfdc6Ef2BP6kC6/BOeXGWrk4L9/xnuHc+H1dpAgmilfz3W3GbzYkNbs+Xxx9ASKO8fdnbTW9V6cCYiWzxGabp2oX1YjvfihGDvskE4IRsi3ozicWLRd9ANip1f110KTrOQvP7YVV3GWSe81obwKfAB9MHHq3vl8LQ/NmVnftjZxO34A0aQs9no7Nx9TRhcP2h0x2stjLknG79DpnCm4Ff/cQ9MMWBLP/B+IpIod+On9HOWbqf2b8sKf77vBfYud3dWTatVYwhi7dcDSS2XLx2dSpNOx+mzFiWwR+TFLThN9ll2I+lvpSVyfIzbyYQ==
  • Authentication-results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=epam.com;
  • Cc: Rahul Singh <rahul.singh@xxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>
  • Delivery-date: Thu, 16 Sep 2021 07:16:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXoJ50hf+zXQhc2ka7YfmmERKX+aucB06AgAE4wYCAAAtRgIAADDiAgAARZwCAAFsJAIAAFA4AgAO3eICAAc7agIAA860AgABTv4CAAPQVgIAAAsoAgAC3nAA=
  • Thread-topic: [PATCH 10/11] xen/arm: Do not map PCI ECAM space to Domain-0's p2m

Hi, Stefano!

On 15.09.21 23:19, Stefano Stabellini wrote:
> On Wed, 15 Sep 2021, Stefano Stabellini wrote:
>> On Wed, 15 Sep 2021, Oleksandr Andrushchenko wrote:
>>> On 15.09.21 03:36, Stefano Stabellini wrote:
>>>> On Tue, 14 Sep 2021, Oleksandr Andrushchenko wrote:
>>>>> With the patch above I have the following log in Domain-0:
>>>>>
>>>>> generic-armv8-xt-dom0 login: root
>>>>> root@generic-armv8-xt-dom0:~# (XEN) *** Serial input to Xen (type 
>>>>> 'CTRL-a' three times to switch input)
>>>>> (XEN) ==== PCI devices ====
>>>>> (XEN) ==== segment 0000 ====
>>>>> (XEN) 0000:03:00.0 - d0 - node -1
>>>>> (XEN) 0000:02:02.0 - d0 - node -1
>>>>> (XEN) 0000:02:01.0 - d0 - node -1
>>>>> (XEN) 0000:02:00.0 - d0 - node -1
>>>>> (XEN) 0000:01:00.0 - d0 - node -1
>>>>> (XEN) 0000:00:00.0 - d0 - node -1
>>>>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>>>>>
>>>>> root@generic-armv8-xt-dom0:~# modprobe e1000e
>>>>> [   46.104729] e1000e: Intel(R) PRO/1000 Network Driver
>>>>> [   46.105479] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>>> [   46.107297] e1000e 0000:03:00.0: enabling device (0000 -> 0002)
>>>>> (XEN) map [e0000, e001f] -> 0xe0000 for d0
>>>>> (XEN) map [e0020, e003f] -> 0xe0020 for d0
>>>>> [   46.178513] e1000e 0000:03:00.0: Interrupt Throttling Rate (ints/sec) 
>>>>> set to dynamic conservative mode
>>>>> [   46.189668] pci_msi_setup_msi_irqs
>>>>> [   46.191016] nwl_compose_msi_msg msg at fe440000
>>>>> (XEN) traps.c:2014:d0v0 HSR=0x00000093810047 pc=0xffff8000104b4b00 
>>>>> gva=0xffff800010fa5000 gpa=0x000000e0040000
>>>>> [   46.200455] Unhandled fault at 0xffff800010fa5000
>>>>>
>>>>> [snip]
>>>>>
>>>>> [   46.233079] Call trace:
>>>>> [   46.233559]  __pci_write_msi_msg+0x70/0x180
>>>>> [   46.234149]  pci_msi_domain_write_msg+0x28/0x30
>>>>> [   46.234869]  msi_domain_activate+0x5c/0x88
>>>>>
>>>>>    From the above you can see that BARs are mapped for Domain-0 now
>>>>>
>>>>> only when an assigned PCI device gets enabled in Domain-0.
>>>>>
>>>>> Another thing to note is that we crash on MSI-X access as BARs are mapped
>>>>>
>>>>> to the domain while enabling memory decoding in the COMMAND register,
>>>>>
>>>>> but MSI-X are handled differently, e.g. we have MSI-X holes in the 
>>>>> mappings.
>>>>>
>>>>> This is because before this change the whole PCI aperture was mapped into
>>>>>
>>>>> Domain-0 and it is not. Thus, MSI-X holes are left unmapped now and there
>>>>>
>>>>> was no need to do so, e.g. they were always mapped into Domain-0 and
>>>>>
>>>>> emulated for guests.
>>>>>
>>>>> Please note that one cannot use xl pci-attach in this case to attach the 
>>>>> PCI device
>>>>>
>>>>> in question to Domain-0 as (please see the log) that device is already 
>>>>> attached.
>>>>>
>>>>> Neither it can be detached and re-attached. So, without mapping MSI-X 
>>>>> holes for
>>>>>
>>>>> Domain-0 the device becomes unusable in Domain-0. At the same time the 
>>>>> device
>>>>>
>>>>> can be successfully passed to DomU.
>>>>>
>>>>>
>>>>> Julien, Stefano! Please let me know how can we proceed with this.
>>>> What was the plan for MSI-X in Dom0?
>>> It just worked because we mapped everything
>>>> Given that Dom0 interacts with a virtual-ITS and virtual-GIC rather than
>>>> a physical-ITS and physical-GIC, I imagine that it wasn't correct for
>>>> Dom0 to write to the real MSI-X table directly?
>>>>
>>>> Shouldn't Dom0 get emulated MSI-X tables like any DomU?
>>>>
>>>> Otherwise, if Dom0 is expected to have the real MSI-X tables mapped, then
>>>> I would suggest to map them at the same time as the BARs. But I am
>>>> thinking that Dom0 should get emulated MSI-X tables, not physical MSI-X
>>>> tables.
>>> Yes, it seems more than reasonable to enable emulation for Domain-0
>>>
>>> as well. Other than that, Stefano, do you think we are good to go with
>>>
>>> the changes I did in order to unmap everything for Domain-0?
>>
>> It might be better to resend the series with the patch in it, because it
>> is difficult to review the patch like this.

This is true. Taking into account Xen release plan I am just trying to

minimize the turn around here. Sorry about this

>>   Nonetheless I tried, but I
>> might have missed something.
Thank you for your time!!
>>
>> Previously the whole PCIe bridge aperture was mapped to Dom0, and
>> it was done by map_range_to_domain, is that correct?

Yes, but not only the aperture: please see below.


>>
>> Now this patch, to avoid mapping the entire aperture to Dom0, is
>> skipping any operations for PCIe devices in map_range_to_domain by
>> setting need_mapping = false.
>>
>> The idea is that instead, we'll only map things when needed and not the
>> whole aperture. However, looking at the changes to
>> pci_host_bridge_mappings (formerly known as
>> pci_host_bridge_need_p2m_mapping), it is still going through the full
>> list of address ranges of the PCIe bridge and map each range one by one
>> using map_range_to_domain. Also, pci_host_bridge_mappings is still
>> called unconditionally at boot for Dom0.
>>
>> So I am missing the part where the aperture is actually *not* mapped to
>> Dom0.
With map_range_to_domain we also mapped all the entries

of the "reg" and "ranges" properties. Let's have a look at [1]:

- ranges         : As described in IEEE Std 1275-1994, but must provide
                    at least a definition of non-prefetchable memory. One
                    or both of prefetchable Memory and IO Space may also
                    be provided.

- reg            : The Configuration Space base address and size, as accessed
                    from the parent bus.  The base address corresponds to
                    the first bus in the "bus-range" property.  If no
                    "bus-range" is specified, this will be bus 0 (the default).

The most interesting comes when "reg" also has other then configuration

space addresses, for example [2]:

- reg: Should contain rc_dbi, config registers location and length.
- reg-names: Must include the following entries:
   "rc_dbi": controller configuration registers;
   "config": PCIe configuration space registers.

So, we don't need to map "ranges" and *config* from the "reg", but all

the rest from "reg" still needs to be mapped to Domain-0, so the PCIe

bridge can remain functional in Domain-0.

>>   What's the difference between the loop in
>> pci_host_bridge_mappings:
>>
>>    for ( i = 0; i < dt_number_of_address(dev); i++ )
>>      map_range_to_domain...
>>
>> and the previous code in map_range_to_domain? I think I am missing
>> something but as mentioned it is difficult to review the patch like this
>> out of order.
>>
>> Also, and this is minor, even if currently unused, it might be good to
>> keep a length parameter to pci_host_bridge_need_p2m_mapping.
> It looks like the filtering is done based on:
>
> return cfg->phys_addr != addr

As I explained above it is *now* the only range that we *do not want* to

be mapped to Domain-0. Other "reg" entries still need to be mapped.

>
> in pci_ecam_need_p2m_mapping that is expected to filter out the address
> ranges we don't want to map because it comes from
> xen/arch/arm/pci/pci-host-common.c:gen_pci_init:
>
>      /* Parse our PCI ecam register address*/
>      err = dt_device_get_address(dev, ecam_reg_idx, &addr, &size);
>      if ( err )
>          goto err_exit;
>
> In pci_host_bridge_mappings, instead of parsing device tree again, can't
> we just fetch the address and length we need to map straight from
> bridge->sysdata->phys_addr/size ?

We can't as the address describes the configuration space which we

*do not* want to be mapped, but what we want is other than configuration

entries in the "reg" property.

>
> At the point when pci_host_bridge_mappings is called in your new patch,
> we have already initialized all the PCIe-related data structures. It
> seems a bit useless to have to go via device tree again.

Bottom line: we do need to go over the "reg" property and map the regions

of which PCIe bridge is dependent and only skip "config" part of it.

Thank you,

Oleksandr


[1] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt

[2] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.