[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: domain_build: Ignore device nodes with invalid addresses



On Wed, 3 Feb 2021, Julien Grall wrote:
> On 03/02/2021 00:18, Stefano Stabellini wrote:
> > On Tue, 2 Feb 2021, Julien Grall wrote:
> > > On 02/02/2021 18:12, Julien Grall wrote:
> > > > On 02/02/2021 17:47, Elliott Mitchell wrote:
> > > > > The handle_device() function has been returning failure upon
> > > > > encountering a device address which was invalid.  A device tree which
> > > > > had such an entry has now been seen in the wild.  As it causes no
> > > > > failures to simply ignore the entries, ignore them. >
> > > > > Signed-off-by: Elliott Mitchell <ehem+xenn@xxxxxxx>
> > > > > 
> > > > > ---
> > > > > I'm starting to suspect there are an awful lot of places in the
> > > > > various
> > > > > domain_build.c files which should simply ignore errors.  This is now
> > > > > the
> > > > > second place I've encountered in 2 months where ignoring errors was
> > > > > the
> > > > > correct action.
> > > > 
> > > > Right, as a counterpoint, we run Xen on Arm HW for several years now and
> > > > this is the first time I heard about issue parsing the DT. So while I
> > > > appreciate that you are eager to run Xen on the RPI...
> > > > 
> > > > >   I know failing in case of error is an engineer's
> > > > > favorite approach, but there seem an awful lot of harmless failures
> > > > > causing panics.
> > > > > 
> > > > > This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
> > > > > empty memory bank".  Now it seems clear the correct approach is to
> > > > > simply
> > > > > ignore these entries.
> > > > 
> > > > ... we first need to fully understand the issues. Here a few questions:
> > > >      1) Can you provide more information why you believe the address is
> > > > invalid?
> > > >      2) How does Linux use the node?
> > > >      3) Is it happening with all the RPI DT? If not, what are the
> > > > differences?
> > > 
> > > So I had another look at the device-tree you provided earlier on. The node
> > > is
> > > the following (copied directly from the DTS):
> > > 
> > > &pcie0 {
> > >          pci@1,0 {
> > >                  #address-cells = <3>;
> > >                  #size-cells = <2>;
> > >                  ranges;
> > > 
> > >                  reg = <0 0 0 0 0>;
> > > 
> > >                  usb@1,0 {
> > >                          reg = <0x10000 0 0 0 0>;
> > >                          resets = <&reset
> > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > >                  };
> > >          };
> > > };
> > > 
> > > pcie0: pcie@7d500000 {
> > >     compatible = "brcm,bcm2711-pcie";
> > >     reg = <0x0 0x7d500000  0x0 0x9310>;
> > >     device_type = "pci";
> > >     #address-cells = <3>;
> > >     #interrupt-cells = <1>;
> > >     #size-cells = <2>;
> > >     interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > >                  <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > >     interrupt-names = "pcie", "msi";
> > >     interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > >     interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > >                                                       
> > > IRQ_TYPE_LEVEL_HIGH>;
> > >     msi-controller;
> > >     msi-parent = <&pcie0>;
> > > 
> > >     ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > >               0x0 0x40000000>;
> > >     /*
> > >      * The wrapper around the PCIe block has a bug
> > >      * preventing it from accessing beyond the first 3GB of
> > >      * memory.
> > >      */
> > >     dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > >                   0x0 0xc0000000>;
> > >     brcm,enable-ssc;
> > > };
> > > 
> > > The interpretation of "reg" depends on the context. In this case, we are
> > > trying to interpret as a memory address from the CPU PoV when it has a
> > > different meaning (I am not exactly sure what).
> > > 
> > > In fact, you are lucky that Xen doesn't manage to interpret it. Xen should
> > > really stop trying to look region to map when it discover a PCI bus. I
> > > wrote a
> > > quick hack patch that should ignore it:
> > 
> > Yes, I think you are right. There are a few instances where "reg" is not
> > a address ready to be remapped. It is not just PCI, although that's the
> > most common.  Maybe we need a list, like skip_matches in handle_node.
> 
> From my understanding, "reg" can be considered as an MMIO region only if all
> the *parents up to the root have the property "ranges" and they are not on a
> different bus (e.g. pci).
> 
> Do you have example where this is not the case?

The famous one is CPUs. These are some other examples I could find with
a quick search:

                        nvmem_firmware {
                                compatible = "xlnx,zynqmp-nvmem-fw";
                                #address-cells = <0x1>;
                                #size-cells = <0x1>;

                                soc_revision@0 {
                                        reg = <0x0 0x4>;
                                };
                        };

                cci@fd6e0000 {
                        compatible = "arm,cci-400";
                        reg = <0x0 0xfd6e0000 0x0 0x9000>;
                        ranges = <0x0 0x0 0xfd6e0000 0x10000>;
                        #address-cells = <0x1>;
                        #size-cells = <0x1>;

                        pmu@9000 {
                                compatible = "arm,cci-400-pmu,r1";
                                reg = <0x9000 0x5000>;
                                interrupt-parent = <0x4>;
                                interrupts = <0x0 0x7b 0x4 0x0 0x7b 0x4 0x0 
0x7b 0x4 0x0 0x7b 0x4 0x0 0x7b 0x4>;
                        };
                };


                i2c@ff020000 {
                        compatible = "cdns,i2c-r1p14";
                        status = "okay";
                        interrupt-parent = <0x4>;
                        interrupts = <0x0 0x11 0x4>;
                        reg = <0x0 0xff020000 0x0 0x1000>;
                        #address-cells = <0x1>;
                        #size-cells = <0x0>;
                        clocks = <0x3 0x3d>;
                        clock-frequency = <0x61a80>;

                        gpio@20 {
                                compatible = "ti,tca6416";
                                reg = <0x20>;


                ethernet@ff0e0000 {
                        compatible = "cdns,zynqmp-gem", "cdns,gem";
                        status = "okay";
                        interrupt-parent = <0x4>;
                        interrupts = <0x0 0x3f 0x4 0x0 0x3f 0x4>;
                        reg = <0x0 0xff0e0000 0x0 0x1000>;
                        clock-names = "pclk", "hclk", "tx_clk";
                        #address-cells = <0x1>;
                        #size-cells = <0x0>;
                        iommus = <0xd 0x877>;
                        clocks = <0xc 0xc 0xc>;
                        phy-handle = <0xe>;
                        phy-mode = "rgmii-id";
                        #stream-id-cells = <0x1>;
                        phandle = <0x16>;

                        ethernet-phy@c {
                                reg = <0xc>;
                                ti,rx-internal-delay = <0x8>;
                                ti,tx-internal-delay = <0xa>;
                                ti,fifo-depth = <0x1>;
                                ti,dp83867-rxctrl-strap-quirk;
                                phandle = <0xe>;
                        };
                };


The rule that *parents up to the root have the property "ranges"* seems
to apply correctly to all these cases. Good.


> Whether Xen does it correctly is another question :).
> 
> > 
> > 
> > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > index 374bf655ee34..937fd1e387b7 100644
> > > --- a/xen/arch/arm/domain_build.c
> > > +++ b/xen/arch/arm/domain_build.c
> > > @@ -1426,7 +1426,7 @@ static int __init handle_device(struct domain *d,
> > > struct
> > > dt_device_node *dev,
> > > 
> > >   static int __init handle_node(struct domain *d, struct kernel_info
> > > *kinfo,
> > >                                 struct dt_device_node *node,
> > > -                              p2m_type_t p2mt)
> > > +                              p2m_type_t p2mt, bool pci_bus)
> > >   {
> > >       static const struct dt_device_match skip_matches[] __initconst =
> > >       {
> > > @@ -1532,9 +1532,14 @@ static int __init handle_node(struct domain *d,
> > > struct
> > > kernel_info *kinfo,
> > >                  "WARNING: Path %s is reserved, skip the node as we may
> > > re-use
> > > the path.\n",
> > >                  path);
> > > 
> > > -    res = handle_device(d, node, p2mt);
> > > -    if ( res)
> > > -        return res;
> > > +    if ( !pci_bus )
> > > +    {
> > > +        res = handle_device(d, node, p2mt);
> > > +        if ( res)
> > > +           return res;
> > > +
> > > +        pci_bus = dt_device_type_is_equal(node, "pci");
> > > +    }
> > > 
> > >       /*
> > >        * The property "name" is used to have a different name on older FDT
> > > @@ -1554,7 +1559,7 @@ static int __init handle_node(struct domain *d,
> > > struct
> > > kernel_info *kinfo,
> > > 
> > >       for ( child = node->child; child != NULL; child = child->sibling )
> > >       {
> > > -        res = handle_node(d, kinfo, child, p2mt);
> > > +        res = handle_node(d, kinfo, child, p2mt, pci_bus);
> > >           if ( res )
> > >               return res;
> > >       }
> > > @@ -2192,7 +2197,7 @@ static int __init prepare_dtb_hwdom(struct domain
> > > *d,
> > > struct kernel_info *kinfo)
> > > 
> > >       fdt_finish_reservemap(kinfo->fdt);
> > > 
> > > -    ret = handle_node(d, kinfo, dt_host, default_p2mt);
> > > +    ret = handle_node(d, kinfo, dt_host, default_p2mt, false);
> > >       if ( ret )
> > >           goto err;
> > > 
> > > A less hackish possibility would be to modify dt_number_of_address() and
> > > return 0 when the device is a child of a PCI below.
> > > 
> > > Stefano, do you have any opinions?
> > 
> > Would PCIe even work today? Because if it doesn't, we could just add it
> > to skip_matches until we get PCI passthrough properly supported.
> PCIe (or PCI) definitely works in dom0 today but Xen is not aware of the
> hostbridge. So you would break quite a few uses cases by skipping the nodes.

Never mind my suggestion


> > But aside from PCIe, let's say that we know of a few nodes for which
> > "reg" needs a special treatment. I am not sure it makes sense to proceed
> > with parsing those nodes without knowing how to deal with that.
> 
> I believe that most of the time the "special" treatment would be to ignore the
> property "regs" as it will not be an CPU memory address.
> 
> > So maybe
> > we should add those nodes to skip_matches until we know what to do with
> > them. At that point, I would imagine we would introduce a special
> > handle_device function that knows what to do. In the case of PCIe,
> > something like "handle_device_pcie".
> Could you outline how "handle_device_pcie()" will differ with handle_node()?
> 
> In fact, the problem is not the PCIe node directly. Instead, it is the second
> level of nodes below it (i.e usb@...).
> 
> The current implementation of dt_number_of_address() only look at the bus type
> of the parent. As the parent has no bus type and "ranges" then it thinks this
> is something we can translate to a CPU address.
> 
> However, this is below a PCI bus so the meaning of "reg" is completely
> different. In this case, we only need to ignore "reg".

I see what you are saying and I agree: if we had to introduce a special
case for PCI, then  dt_number_of_address() seems to be a good place.  In
fact, we already have special PCI handling, see our
__dt_translate_address function and xen/common/device_tree.c:dt_busses.

Which brings the question: why is this actually failing?

pcie0 {
     ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;

Which means that PCI addresses 0xc0000000-0x100000000 become 
0x600000000-0x700000000.

The offending DT is:

&pcie0 {
         pci@1,0 {
                 #address-cells = <3>;
                 #size-cells = <2>;
                 ranges;

                 reg = <0 0 0 0 0>;

                 usb@1,0 {
                         reg = <0x10000 0 0 0 0>;
                         resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
                 };
         };
};


reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
However, the rest of the regs cells are left as zero. It shouldn't be an
issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus. So
in theory dt_number_of_address() should already return 0 for it.

Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
add a check to skip 0 size ranges. Just a hack to explain what I mean:

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 18825e333e..236b30675b 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -866,6 +866,9 @@ int dt_device_get_address(const struct dt_device_node *dev, 
unsigned int index,
     unsigned int flags;
 
     addrp = dt_get_address(dev, index, size, &flags);
+    if ( *size == 0 )
+        return 0;
+
     if ( addrp == NULL )
         return -EINVAL;
 

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.