[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH ARM v5 19/20] mini-os: initial ARM support



Hi Thomas,

On 30/06/14 20:12, Thomas Leonard wrote:
On 28 June 2014 19:31, Julien Grall <julien.grall@xxxxxxxxxx> wrote:


On 26/06/14 12:28, Thomas Leonard wrote:

On ARM, Mini-OS will boot and display some output on the console.
Tested with:

make XEN_TARGET_ARCH=arm32 CROSS_COMPILE=arm-linux-gnueabihf- \
         CONFIG_TEST=y CONFIG_START_NETWORK=n CONFIG_BLKFRONT=n \
         CONFIG_NETFRONT=n CONFIG_FBFRONT=n CONFIG_KBDFRONT=n \
         CONFIG_CONSFRONT=n CONFIG_XC=n -j4

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@xxxxxxxxx>
[talex5@xxxxxxxxx: made x86_64 support work again]
[talex5@xxxxxxxxx: split into multiple patches]
[talex5@xxxxxxxxx: re-enabled force_evtchn_callback]
[talex5@xxxxxxxxx: enable regular console]
[talex5@xxxxxxxxx: fixed initialisation code:
- Configure write-back caching in page table. This is needed for
    reliable hypercalls to Xen (thanks to Julien Grall).
- Use "client mode" for access control (domains are deprecated,
    according to ARM Cortex-A Series Programmer’s Guide version 4.0,
    section 9.6.4).
- Enable more SCTLR features (icache, branch prediction)]
[talex5@xxxxxxxxx: use Virtual Count register for monotonic time]
[talex5@xxxxxxxxx: fixed HYPERVISOR_shutdown]
[talex5@xxxxxxxxx: get xenstore details from hypervisor]
[talex5@xxxxxxxxx: use GCC implementation of division]
[talex5@xxxxxxxxx: cleaned up interrupt handlers and threading]
[talex5@xxxxxxxxx: call exit_thread when a thread returns]
[talex5@xxxxxxxxx: implemented block_domain for ARM]
[talex5@xxxxxxxxx: fixed hang when enabling interrupts]
[talex5@xxxxxxxxx: added -march=armv7-a to flags]
[talex5@xxxxxxxxx: CLREX after handling IRQs]
[talex5@xxxxxxxxx: unbind debug port at shutdown]
[talex5@xxxxxxxxx: allow unaligned accesses]
[talex5@xxxxxxxxx: fix zImage header for XSA-95]
[talex5@xxxxxxxxx: get RAM base and size from the FDT]
[talex5@xxxxxxxxx: get GIC addresses from FDT]
[talex5@xxxxxxxxx: added ARM grant table initialisation]
[talex5@xxxxxxxxx: added missing copyright header to hypercalls32.S]
[talex5@xxxxxxxxx: moved GIC driver to arm directory]
[talex5@xxxxxxxxx: fixes suggested by Julien Grall:
- Removed unnecessary isb.
- Renamed GICD_PRIORITY to GICD_IPRIORITYR.
- Change IRQ number type from unsigned char to int.
- Added volatile to {set,clear}_bit_non_atomic.
- Fixed some comments.
- Check compatible properties in DTB.]
[talex5@xxxxxxxxx: made image relocatable]
[talex5@xxxxxxxxx: added mfn_to_pfn and pfn_to_mfn]


IHMO all these changes doesn't belong to the commit message, i.e should not
appear in the commit message when Ian will apply your commit message.

As I understand it, as I'm not the author of the patch, it's polite to
indicate how I have modified it. I'm trying to follow the guidelines
here:
"Rule (b) allows you to adjust the code, but then it is very impolite
to change one submitter's code and make him endorse your bugs. To
solve this problem, it is recommended that you add a line between the
last Signed-off-by header and yours, indicating the nature of your
changes."

I agree with it. But you modify so much things in this patch, that I think it's not necessary to specify all of them.

Anyway it was only a suggestion.


Signed-off-by: Thomas Leonard <talex5@xxxxxxxxx>
---
   extras/mini-os/ARM-TODO.txt              |   6 +
   extras/mini-os/Config.mk                 |   2 +
   extras/mini-os/Makefile                  |   9 ++
   extras/mini-os/arch/arm/Makefile         |  32 ++++
   extras/mini-os/arch/arm/arch.mk          |   7 +
   extras/mini-os/arch/arm/arm32.S          | 266
+++++++++++++++++++++++++++++++
   extras/mini-os/arch/arm/events.c         |  30 ++++
   extras/mini-os/arch/arm/gic.c            | 222
++++++++++++++++++++++++++
   extras/mini-os/arch/arm/hypercalls32.S   |  75 +++++++++
   extras/mini-os/arch/arm/minios-arm32.lds |  75 +++++++++
   extras/mini-os/arch/arm/mm.c             | 134 ++++++++++++++++
   extras/mini-os/arch/arm/sched.c          |  37 +++++
   extras/mini-os/arch/arm/setup.c          | 116 ++++++++++++++
   extras/mini-os/arch/arm/time.c           | 202 +++++++++++++++++++++++
   extras/mini-os/kernel.c                  |   2 +-
   15 files changed, 1214 insertions(+), 1 deletion(-)
   create mode 100644 extras/mini-os/ARM-TODO.txt
   create mode 100755 extras/mini-os/arch/arm/Makefile
   create mode 100644 extras/mini-os/arch/arm/arch.mk
   create mode 100644 extras/mini-os/arch/arm/arm32.S
   create mode 100644 extras/mini-os/arch/arm/events.c
   create mode 100644 extras/mini-os/arch/arm/gic.c
   create mode 100644 extras/mini-os/arch/arm/hypercalls32.S
   create mode 100755 extras/mini-os/arch/arm/minios-arm32.lds
   create mode 100644 extras/mini-os/arch/arm/mm.c
   create mode 100644 extras/mini-os/arch/arm/sched.c
   create mode 100644 extras/mini-os/arch/arm/setup.c
   create mode 100644 extras/mini-os/arch/arm/time.c

diff --git a/extras/mini-os/Config.mk b/extras/mini-os/Config.mk
index d61877b..4ecde54 100644
--- a/extras/mini-os/Config.mk
+++ b/extras/mini-os/Config.mk
@@ -12,6 +12,8 @@ export XEN_INTERFACE_VERSION
   # If not x86 then use $(XEN_TARGET_ARCH)
   ifeq ($(findstring x86_,$(XEN_TARGET_ARCH)),x86_)
   TARGET_ARCH_FAM = x86
+else ifeq ($(findstring arm,$(XEN_TARGET_ARCH)),arm)
+TARGET_ARCH_FAM = arm
   else
   TARGET_ARCH_FAM = $(XEN_TARGET_ARCH)
   endif
diff --git a/extras/mini-os/Makefile b/extras/mini-os/Makefile
index 931cd05..01d8af0 100644
--- a/extras/mini-os/Makefile
+++ b/extras/mini-os/Makefile
@@ -78,6 +78,9 @@ TARGET := mini-os
   SUBDIRS := lib xenbus console

   ifeq ($(XEN_TARGET_ARCH),arm32)
+# Need libgcc.a for division helpers
+LDLIBS += `$(CC) -print-libgcc-file-name`


OOI, how much code does add libgcc for the division helpers?

Hard to say. libgcc.a contains many files, but the ones with "div" in
the name come to about 67K, so probably less than that.

Ok.

diff --git a/extras/mini-os/arch/arm/arm32.S
b/extras/mini-os/arch/arm/arm32.S
new file mode 100644
index 0000000..de74ed9
--- /dev/null
+++ b/extras/mini-os/arch/arm/arm32.S
@@ -0,0 +1,266 @@
+@ Virtual address of the start of RAM (any value will do, but it must be
+@ section-aligned). Update the lds script if changed.
+#define VIRT_BASE 0x400000
+
+@ Offset of the kernel within the RAM. This is a zImage convention which
we
+@ rely on.
+#define ZIMAGE_KERNEL_OFFSET 0x8000


Hmmm... this is not a zImage convention... IIRC Linux is using this offset
to have enough space to create startup page table during boot.

OK, so this is a convention of Linux only? I found this reference:

http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html

"Despite the ability to place zImage anywhere within memory,
convention has it that it is loaded at the base of physical RAM plus
an offset of 0x8000 (32K). This leaves space for the parameter block
usually placed at offset 0x100, zero page exception vectors and page
tables. This convention is *very* common."

I think this convention is for kernel which are able to load only in a specific address (i.e CONFIG_AUTO_ZRELADDR=n). Linux Xen guest need to have this option set to be able to boot. So Linux will calculate the relocation address itself:

#ifdef CONFIG_AUTO_ZRELADDR
                @ determine final kernel image address
                mov     r4, pc
                and     r4, r4, #0xf8000000
                add     r4, r4, #TEXT_OFFSET
#else
                ldr     r4, =zreladdr
#endif

TEXT_OFFSET is equal to 0x8000 here.

This code lives in Linux since at least 2010 (last commit in git blame arch/arm/boot/compressed/head.S).

Futhermore, for DOM0 we are using a 2MB-aligned address. I think it's time to make the same thing for the guest (see the patch I sent earlier today: https://patches.linaro.org/32742/).

This will also help to add support more easily for Xen in new OS.

The Linux zImage is able to relocate itself in the memory to respect this
convention. But the zImage itself can be loaded anywhere in the memory.

By looking to your code below your are relying that the kernel will be
loaded at 0xXXXX8000 which is incorrect. This offset is odd and make other
kernel (such as FreeBSD) requiring the same trick which is not part of the
Linux boot protocol.

Sorry, I don't understand this. You're saying that Xen's choice of
0x8000 forces FreeBSD to support this offset too? But that isn't
caused by anything in Mini-OS.

I plan to send a patch to require the start address to be 2MB aligned, so
kernels will be able to use 2MB (for LPAE) and 1MB section for there early
page table.

If the kernel wants another alignment, then you will have to relocate
yourself. Even though in your case, you don't need this odd 0xXXXX8000.

It's very convenient to know at least the offset where we will be
loaded. Otherwise, we have to move the image around in memory with
loads and stores. I think there is some work going on to share images
between VMs (for running large numbers of VMs on one machine). I don't
know the details, but presumably it would be easier to support that if
the image's text section can be mapped read-only.

I sent a patch to handle 2MB-align with is less odd than the 0x8000. IHMO, the latter offset has no sense as it's not part of the ABI.

[..]


+       @ Fill in the whole top-level translation table (at page_dir).
+       @ Populate the whole pagedir with 1MB section descriptors.
+
+       mov     r1, r7                  @ r1 -> first section entry
+       add     r3, r1, #4*4*1024       @ limit (4 GB address space, 4
byte entries)
+       orr     r0, r8, r9              @ r0 = entry mapping section zero
to start of physical RAM
+1:
+       str     r0, [r1],#4             @ write the section entry
+       add     r0, r0, #1 << 20        @ next physical page (wraps)
+       cmp     r1, r3
+       bne     1b


I would document a bit more this part. It took me a bit of time to
understand that you mapping the whole address space in an odd manner.

Could you suggest something? It's hard to know what other people will
find confusing. For reference, here's the existing comment in the code
where I tried to explain the scheme:

     @ Problem: the C code wants to be at a known address (VIRT_BASE),
but Xen might
     @ load us anywhere. We initialise the MMU (mapping virtual to
physical @ addresses)
     @ so everything ends up where the code expects it to be.
     @
     @ We calculate the offet between where the linker thought _start
would be and where
     @ it actually is and initialise the page tables to have that
offset for every page.
     @
     @ When we turn on the MMU, we're still executing at the old
address. We don't want
     @ the code to disappear from under us. So we have to do the
mapping in stages:
     @
     @ 1. set up a mapping to our current page from both its current
and desired addresses
     @ 2. enable the MMU
     @ 3. jump to the new address
     @ 4. remap all the other pages with the calculated offset

Actually, I like the existing comment. It tell the problems and explain how to solve it. Can you keep it?

While it's fine for a first version of Mini-OS support for ARM. It think at
long term you want to map only necessary bank rank. It will be easier to
catch programming error and avoid to trap in the hypervisor because the
physical address doesn't exist.

Yes, this is just the boot code. Eventually, the C code should make
the .text read-only (and unmap pages that don't exist, although I
don't think there's currently any disadvantage to having the trap
caused by the second stage page tables). I also want to add guard
pages to the thread stacks once these initial patches are in.

Futhermore, the data abort sent by the hypervisor is odd (debug smth).

Could you explain what you mean by "odd" here?

Xen is injecting a debug fault when it needs to send a data abort to the guest. It may be confusing for the developer that doesn't know Xen internal.

Anyway, I'm fine with the current solution.

+.pushsection .data
+_data:
+.align 14
+.globl page_dir
+@ Each word maps one 1MB virtual section to a physical section
+page_dir:
+       .fill (4*1024), 4, 0x0
+
+.align 12
+.globl shared_info_page
+shared_info_page:
+       .fill (1024), 4, 0x0
+
+.align 3
+.globl stack
+stack:
+       .fill (4*1024), 4, 0x0
+stack_end:
+
+.align 3
+irqstack:
+       .fill (1024), 4, 0x0
+irqstack_end:
+
+.globl physical_address_offset
+physical_address_offset:
+       .long   0
+
+.popsection


Any reason to define theses variables in ASM rather than C?

I normally prefer to define variables in the module that sets them,
but it doesn't make much difference to me.

Though perhaps they should be moved to the .bss section (except for
the page table, which could be moved to the start of RAM to simplify
things).

It would help for ARM64 port as I don't think those variables is 32bit specific.

diff --git a/extras/mini-os/arch/arm/gic.c b/extras/mini-os/arch/arm/gic.c
new file mode 100644
index 0000000..5641eb0
--- /dev/null
+++ b/extras/mini-os/arch/arm/gic.c
@@ -0,0 +1,222 @@
+// ARM GIC implementation
+
+#include <mini-os/os.h>
+#include <mini-os/hypervisor.h>
+#include <libfdt.h>
+
+//#define VGIC_DEBUG
+#ifdef VGIC_DEBUG
+#define DEBUG(_f, _a...) \
+    DEBUG("MINI_OS(file=vgic.c, line=%d) " _f , __LINE__, ## _a)


Did you intend to use printk here?

It does look odd. I'll take a look.

Thanks.

[..]


+void gic_init(void) {
+    gic.gicd_base = NULL;
+    int node = 0;
+    int depth = 0;
+    for (;;)
+    {
+        node = fdt_next_node(device_tree, node, &depth);
+        if (node <= 0 || depth < 0)
+            break;
+
+        if (fdt_getprop(device_tree, node, "interrupt-controller", NULL))
{
+            int len = 0;
+
+            if (fdt_node_check_compatible(device_tree, node,
"arm,cortex-a15-gic") &&
+                fdt_node_check_compatible(device_tree, node,
"arm,cortex-a9-gic") &&


I don't think this compatible is necessary. Cortex A9 doesn't support
virtualisation.

OK. I was just matching on the values that Xen provides. a15-gic
should be fine on its own, though.

IIRC, the toolstack doesn't provide arm,cortex-a9-gic :).

+                fdt_node_check_compatible(device_tree, node,
"arm,cortex-a7-gic")) {
+                printk("Skipping incompatible interrupt-controller
node\n");
+                continue;
+            }
+
+            const uint64_t *reg = fdt_getprop(device_tree, node, "reg",
&len);
+            if (reg == NULL || len != 32) {


As asked on the previous version, if you plan to assume specific range size
for the time-being, please explain the 32.

OK, so we have two registers (GICC and GICD), each of which contains
two parts (an address and a size), each of which is a 64-bit value (8
bytes). So 2 * 2 * 8 = 32. Should I check this as a minimum, or
require it to match exactly?

Match exactly is better. It will catch easily any change in the device tree provided by the toolstack.

Can you add what you said as a comment?

+                /* TODO: support other formats */
+                printk("Bad 'reg' property: %p %d\n", reg, len);
+                continue;
+            }
+            gic.gicd_base = to_virt((long) fdt64_to_cpu(reg[0]));
+            gic.gicc_base = to_virt((long) fdt64_to_cpu(reg[2]));


AFAIU, your are mapping the GIC region with normal attribute. With this
attribute, the processor may reorder the write in the device memory, preload
memory, caching.... Which is completely wrong in this case. I'm sa bit
surprised that it works correctly. You have to map thoses regions as device
memory.

I believe that the ARM uses the most restrictive memory type from the
two stages of the translation table, so if Xen maps it as device
memory then it will work. We can set it here as well, though.

(see "Overlaying the memory type attribute" in the Architecture
Reference Manual)

I don't really want to see this such assumption in Mini-OS. Let say, we decide to change the memory type attribute in Xen 4.x to "Normal". Then we screw Mini-OS, and it would be very hard to debug.

Regards,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.