[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/11] xen/arm: Do not map PCI ECAM space to Domain-0's p2m

  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 16 Sep 2021 07:16:32 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4kxncR3dmPW+A/TwUGeqPoNHqXq87uFi9iCENuBJwxc=; b=fTToUN3PFN6bbv8hWe6NK8PlMeLq+VputtoqNIAijFVekpm7k/JPx/CXqG3MkpMF+EhluX0szAbTzTZCJufKhepCcae0otWCQ42IeeDEFc+QfUedyiSC/mUNuFf8RY0i4WSWvbOUHSpK9FRaW9gyr3mZDzrPSUj6zp2brcwVK7vGmMZAvigXtrfghC8dnhNTh3jZjpD29qmQ9wScfne9e0/Oz2lA22Xq6+MZoWhXEuFzQoQSUCD2A7LivcNo8Iz7rdz3tUybQq3vd7IwF3rL7YugB3hhnX+9Tvqlun+oCCjPB66IYbOBqIaey1E1HluKQxxhpfBCCTVfL3CWYmeXVQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g1WEHDviZEbOXqq5p7vFFSOr2tfdc6Ef2BP6kC6/BOeXGWrk4L9/xnuHc+H1dpAgmilfz3W3GbzYkNbs+Xxx9ASKO8fdnbTW9V6cCYiWzxGabp2oX1YjvfihGDvskE4IRsi3ozicWLRd9ANip1f110KTrOQvP7YVV3GWSe81obwKfAB9MHHq3vl8LQ/NmVnftjZxO34A0aQs9no7Nx9TRhcP2h0x2stjLknG79DpnCm4Ff/cQ9MMWBLP/B+IpIod+On9HOWbqf2b8sKf77vBfYud3dWTatVYwhi7dcDSS2XLx2dSpNOx+mzFiWwR+TFLThN9ll2I+lvpSVyfIzbyYQ==
  • Authentication-results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=epam.com;
  • Cc: Rahul Singh <rahul.singh@xxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>
  • Delivery-date: Thu, 16 Sep 2021 07:16:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-topic: [PATCH 10/11] xen/arm: Do not map PCI ECAM space to Domain-0's p2m

Hi, Stefano!

On 15.09.21 23:19, Stefano Stabellini wrote:
> On Wed, 15 Sep 2021, Stefano Stabellini wrote:
>> On Wed, 15 Sep 2021, Oleksandr Andrushchenko wrote:
>>> On 15.09.21 03:36, Stefano Stabellini wrote:
>>>> On Tue, 14 Sep 2021, Oleksandr Andrushchenko wrote:
>>>>> With the patch above I have the following log in Domain-0:
>>>>> generic-armv8-xt-dom0 login: root
>>>>> root@generic-armv8-xt-dom0:~# (XEN) *** Serial input to Xen (type 
>>>>> 'CTRL-a' three times to switch input)
>>>>> (XEN) ==== PCI devices ====
>>>>> (XEN) ==== segment 0000 ====
>>>>> (XEN) 0000:03:00.0 - d0 - node -1
>>>>> (XEN) 0000:02:02.0 - d0 - node -1
>>>>> (XEN) 0000:02:01.0 - d0 - node -1
>>>>> (XEN) 0000:02:00.0 - d0 - node -1
>>>>> (XEN) 0000:01:00.0 - d0 - node -1
>>>>> (XEN) 0000:00:00.0 - d0 - node -1
>>>>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>>>>> root@generic-armv8-xt-dom0:~# modprobe e1000e
>>>>> [   46.104729] e1000e: Intel(R) PRO/1000 Network Driver
>>>>> [   46.105479] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>>> [   46.107297] e1000e 0000:03:00.0: enabling device (0000 -> 0002)
>>>>> (XEN) map [e0000, e001f] -> 0xe0000 for d0
>>>>> (XEN) map [e0020, e003f] -> 0xe0020 for d0
>>>>> [   46.178513] e1000e 0000:03:00.0: Interrupt Throttling Rate (ints/sec) 
>>>>> set to dynamic conservative mode
>>>>> [   46.189668] pci_msi_setup_msi_irqs
>>>>> [   46.191016] nwl_compose_msi_msg msg at fe440000
>>>>> (XEN) traps.c:2014:d0v0 HSR=0x00000093810047 pc=0xffff8000104b4b00 
>>>>> gva=0xffff800010fa5000 gpa=0x000000e0040000
>>>>> [   46.200455] Unhandled fault at 0xffff800010fa5000
>>>>> [snip]
>>>>> [   46.233079] Call trace:
>>>>> [   46.233559]  __pci_write_msi_msg+0x70/0x180
>>>>> [   46.234149]  pci_msi_domain_write_msg+0x28/0x30
>>>>> [   46.234869]  msi_domain_activate+0x5c/0x88
>>>>>    From the above you can see that BARs are mapped for Domain-0 now
>>>>> only when an assigned PCI device gets enabled in Domain-0.
>>>>> Another thing to note is that we crash on MSI-X access as BARs are mapped
>>>>> to the domain while enabling memory decoding in the COMMAND register,
>>>>> but MSI-X are handled differently, e.g. we have MSI-X holes in the 
>>>>> mappings.
>>>>> This is because before this change the whole PCI aperture was mapped into
>>>>> Domain-0 and it is not. Thus, MSI-X holes are left unmapped now and there
>>>>> was no need to do so, e.g. they were always mapped into Domain-0 and
>>>>> emulated for guests.
>>>>> Please note that one cannot use xl pci-attach in this case to attach the 
>>>>> PCI device
>>>>> in question to Domain-0 as (please see the log) that device is already 
>>>>> attached.
>>>>> Neither it can be detached and re-attached. So, without mapping MSI-X 
>>>>> holes for
>>>>> Domain-0 the device becomes unusable in Domain-0. At the same time the 
>>>>> device
>>>>> can be successfully passed to DomU.
>>>>> Julien, Stefano! Please let me know how can we proceed with this.
>>>> What was the plan for MSI-X in Dom0?
>>> It just worked because we mapped everything
>>>> Given that Dom0 interacts with a virtual-ITS and virtual-GIC rather than
>>>> a physical-ITS and physical-GIC, I imagine that it wasn't correct for
>>>> Dom0 to write to the real MSI-X table directly?
>>>> Shouldn't Dom0 get emulated MSI-X tables like any DomU?
>>>> Otherwise, if Dom0 is expected to have the real MSI-X tables mapped, then
>>>> I would suggest to map them at the same time as the BARs. But I am
>>>> thinking that Dom0 should get emulated MSI-X tables, not physical MSI-X
>>>> tables.
>>> Yes, it seems more than reasonable to enable emulation for Domain-0
>>> as well. Other than that, Stefano, do you think we are good to go with
>>> the changes I did in order to unmap everything for Domain-0?
>> It might be better to resend the series with the patch in it, because it
>> is difficult to review the patch like this.

This is true. Taking into account Xen release plan I am just trying to

minimize the turn around here. Sorry about this

>>   Nonetheless I tried, but I
>> might have missed something.
Thank you for your time!!
>> Previously the whole PCIe bridge aperture was mapped to Dom0, and
>> it was done by map_range_to_domain, is that correct?

Yes, but not only the aperture: please see below.

>> Now this patch, to avoid mapping the entire aperture to Dom0, is
>> skipping any operations for PCIe devices in map_range_to_domain by
>> setting need_mapping = false.
>> The idea is that instead, we'll only map things when needed and not the
>> whole aperture. However, looking at the changes to
>> pci_host_bridge_mappings (formerly known as
>> pci_host_bridge_need_p2m_mapping), it is still going through the full
>> list of address ranges of the PCIe bridge and map each range one by one
>> using map_range_to_domain. Also, pci_host_bridge_mappings is still
>> called unconditionally at boot for Dom0.
>> So I am missing the part where the aperture is actually *not* mapped to
>> Dom0.
With map_range_to_domain we also mapped all the entries

of the "reg" and "ranges" properties. Let's have a look at [1]:

- ranges         : As described in IEEE Std 1275-1994, but must provide
                    at least a definition of non-prefetchable memory. One
                    or both of prefetchable Memory and IO Space may also
                    be provided.

- reg            : The Configuration Space base address and size, as accessed
                    from the parent bus.  The base address corresponds to
                    the first bus in the "bus-range" property.  If no
                    "bus-range" is specified, this will be bus 0 (the default).

The most interesting comes when "reg" also has other then configuration

space addresses, for example [2]:

- reg: Should contain rc_dbi, config registers location and length.
- reg-names: Must include the following entries:
   "rc_dbi": controller configuration registers;
   "config": PCIe configuration space registers.

So, we don't need to map "ranges" and *config* from the "reg", but all

the rest from "reg" still needs to be mapped to Domain-0, so the PCIe

bridge can remain functional in Domain-0.

>>   What's the difference between the loop in
>> pci_host_bridge_mappings:
>>    for ( i = 0; i < dt_number_of_address(dev); i++ )
>>      map_range_to_domain...
>> and the previous code in map_range_to_domain? I think I am missing
>> something but as mentioned it is difficult to review the patch like this
>> out of order.
>> Also, and this is minor, even if currently unused, it might be good to
>> keep a length parameter to pci_host_bridge_need_p2m_mapping.
> It looks like the filtering is done based on:
> return cfg->phys_addr != addr

As I explained above it is *now* the only range that we *do not want* to

be mapped to Domain-0. Other "reg" entries still need to be mapped.

> in pci_ecam_need_p2m_mapping that is expected to filter out the address
> ranges we don't want to map because it comes from
> xen/arch/arm/pci/pci-host-common.c:gen_pci_init:
>      /* Parse our PCI ecam register address*/
>      err = dt_device_get_address(dev, ecam_reg_idx, &addr, &size);
>      if ( err )
>          goto err_exit;
> In pci_host_bridge_mappings, instead of parsing device tree again, can't
> we just fetch the address and length we need to map straight from
> bridge->sysdata->phys_addr/size ?

We can't as the address describes the configuration space which we

*do not* want to be mapped, but what we want is other than configuration

entries in the "reg" property.

> At the point when pci_host_bridge_mappings is called in your new patch,
> we have already initialized all the PCIe-related data structures. It
> seems a bit useless to have to go via device tree again.

Bottom line: we do need to go over the "reg" property and map the regions

of which PCIe bridge is dependent and only skip "config" part of it.

Thank you,






Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.