[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/5] xen/arm: assign devices to boot domains



Hi,

On 1/3/19 10:07 PM, Stefano Stabellini wrote:
On Mon, 24 Dec 2018, Julien Grall wrote:
Hi Stefano,

On 12/5/18 5:28 PM, Stefano Stabellini wrote:
Scan the user provided dtb fragment at boot. For each device node, map
memory to guests, and route interrupts and setup the iommu.

Device memory is only mapped 1:1. It is not possible to assign devices at
locations that conflict with the DomU memory map.
I think 1:1 mapped is a pretty bad idea. You limit yourself a lot as the user
does not control neither the HW nor the guest memory map.

So you need to provide a way for the user to specify the mapping.

Yes, I think we want to introduce a separate xen,reg property that
includes a destination address as well. Something of the format:

   xen,reg = <source_addr source_size dest_addr>

More on this below.



The iommu is setup by passing the to the device to assign on the host

NIT: "the node to..."

I'll fix

device tree. The path is specified in the device tree fragment as the
"path" string property.

Path is too generic and may clash in the future with other bindings. Please
add "xen," in front.

OK, I can rename



Signed-off-by: Stefano Stabellini <stefanos@xxxxxxxxxx>
---
   xen/arch/arm/bootfdt.c        |  4 +-
   xen/arch/arm/domain_build.c   | 85
+++++++++++++++++++++++++++++++++++++++++++
   xen/include/xen/device_tree.h |  2 +
   3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 891b4b6..72cb8d6 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -55,8 +55,8 @@ static bool __init device_tree_node_compatible(const void
*fdt, int node,
       return false;
   }
   -static void __init device_tree_get_reg(const __be32 **cell, u32
address_cells,
-                                       u32 size_cells, u64 *start, u64
*size)
+void __init device_tree_get_reg(const __be32 **cell, u32 address_cells,
+                                u32 size_cells, u64 *start, u64 *size)
   {
       *start = dt_next_cell(address_cells, cell);
       *size = dt_next_cell(size_cells, cell);
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index cc6b464..d48f77e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2094,6 +2094,88 @@ static int __init construct_domain(struct domain *d,
struct kernel_info *kinfo)
       return 0;
   }
   +static int __init scan_pt_node(const void *pfdt,
+                               int nodeoff, const char *name, int depth,
+                               u32 address_cells, u32 size_cells,
+                               void *data)

Is it really necessary to parse the device-tree twice?

I don't think I understand this comment. The device tree fragment is not
scanned twice, just once I think. Am I missing something?

The previous patch is going through the DT to copy the properties over. This patch introduce a new function to go over the DT to find the Device to attach. So you are parsing the DT twice. Is there any reason for that?



+{
+    int rc;
+    struct dt_device_node *node;
+    int len, i;
+    const struct fdt_property *prop;
+    struct kernel_info *kinfo = data;
+    struct domain *d = kinfo->d;
+    const __be32 *cell;
+
+    if ( depth > 2 )
+        return 0;

Why do you limit yourself to depth 2? It is possible to have nested node
describing memory

That was because Xen needed a far better understanding of how reg works
to be able to parse nested nodes correctly.

Xen has already the knowledge for understanding reg (see common/device_tree.c). Although, I agree that it only works on unflatten device-tree today.

Specifically, see this
concrete example from ZynqMP:

         ethernet@ff0e0000 {
             compatible = "cdns,zynqmp-gem";
             status = "okay";
             interrupt-parent = <0xfde8>;
             interrupts = <0x0 0x3f 0x4 0x0 0x3f 0x4>;
             reg = <0x0 0xff0e0000 0x1000>;
             clock-names = "pclk", "hclk", "tx_clk", "rx_clk";
             #address-cells = <0x1>;
             #size-cells = <0x0>;
             clocks = <0x1 0x1 0x1 0x1>;
             phy-mode = "rgmii-id";
             xlnx,ptp-enet-clock = <0x0>;
             local-mac-address = [00 0a 35 00 22 01];
             phy-handle = <0x2>;
             xen,path = "/amba/ethernet@ff0e0000";

             phy@c {
                 reg = <0xc>;
                 ti,rx-internal-delay = <0x8>;
                 ti,tx-internal-delay = <0xa>;
                 ti,fifo-depth = <0x1>;
                 ti,rxctrl-strap-worka;
                 linux,phandle = <0x2>;
                 phandle = <0x2>;
             };
         };

We want to map the first reg property under ethernet@ff0e0000, not the
reg property under phy@c, which it doesn't even refer to a memory
region. This problem goes away completely if we start using a new Xen
specific "xen,reg" property, instead of trying to figure out which
existing reg to map. With that, we can safely remove the artificial
depth <= 2 restriction.

Your proposition for "xen,reg" is a bit different from what I was thinking. Ideally we only want the user to specify once the guest physical address (i.e only in reg). But, I understand that it may require more work.

I think I would be happy with this solution providing there are an action to investigate whether "range" can be used.

+
+    prop = fdt_get_property_namelen(pfdt, nodeoff, "reg", strlen("reg"),
&len);
+    if ( prop != NULL )
+    {
+        paddr_t start, size;
+        cell = (const __be32 *)prop->data;
+        len = fdt32_to_cpu(prop->len) /
+              ((address_cells + size_cells) * sizeof (u32));
+
+        for ( i = 0; i < len; i++ )
+        {
+            device_tree_get_reg(&cell, address_cells, size_cells, &start,
&size);

Here you assume the value in regs correspond to host physical address. This
may not be the case if a device is under a bus.

Yes, one more reason to move away from parsing the existing reg property
and use a new xen,reg property that gives directly the host physical
address to map to the right guest physical address (regardless of
busses). xen,reg would basically be equivalent to the iomem VM config
file option.


+
+            rc = guest_physmap_add_entry(d, _gfn(start >> PAGE_SHIFT),

Please use gaddr_to_gfn and ...

OK


+                                         _mfn(start >> PAGE_SHIFT),

... maddr_to_mfn

OK


+                                         get_order_from_bytes(size),
+                                         p2m_mmio_direct_dev);

I think the restriction on the memory attributes should be documented in patch
#5.

OK


+            if ( rc < 0 )
+            {
+                dprintk(XENLOG_ERR, "Failed to map %"PRIpaddr" to the
guest\n", start);
+                return -EFAULT;
+            }
+        }
+    }
+
+    prop = fdt_get_property_namelen(pfdt, nodeoff, "path", strlen("path"),
&len);
+    if ( prop != NULL ) {
+        node = dt_find_node_by_path((char *)&prop->data[0]);

What's wrong with giving directly prop->data?

Nothing wrong with it, just a matter of code style. I don't care about
this, I can change it.

My issue is with the cast, not &prop->data[0] itself (though I dislike it ;)). Casts are dangerous and should be avoided as much as possible.

[...]

+
+        pt_irq = fdt32_to_cpu(*(u + 1)) + 32;
+
+        vgic_reserve_virq(d, pt_irq);
+        rc = route_irq_to_guest(d, pt_irq, pt_irq, "routed IRQ");
+        if ( rc < 0 )
+            return rc;

You are assuming the device can only generate one interrupt.

That's a mistake, thanks for spotting it, I'll fix it.


Furthermore, this is assuming all the nodes in the fragment will be
under the GIC controller.  You may want to passthrough a interrupt
controller (i.e GPIO) to the guest and the related device.

I don't think that the non-GIC scenario is supported today. I don't
think it works even without dom0less. I wrote more about this as a reply
to the other email.
I believe this works out-of-box with the guest. If we take the example of the GPIO controller, it may be behind the GIC. You only care to route those interrupts used by GPIO and MMIO associated. The rest can be describe normally in the DT.

Of course, this means that you need to passthrough all devices using the GPIO controller to that guest.

So I still think you need to check whether the interrupts belongs the GIC or not.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.