[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH ARM v5 19/20] mini-os: initial ARM support



On 28 June 2014 19:31, Julien Grall <julien.grall@xxxxxxxxxx> wrote:
>
>
> On 26/06/14 12:28, Thomas Leonard wrote:
>>
>> On ARM, Mini-OS will boot and display some output on the console.
>> Tested with:
>>
>> make XEN_TARGET_ARCH=arm32 CROSS_COMPILE=arm-linux-gnueabihf- \
>>         CONFIG_TEST=y CONFIG_START_NETWORK=n CONFIG_BLKFRONT=n \
>>         CONFIG_NETFRONT=n CONFIG_FBFRONT=n CONFIG_KBDFRONT=n \
>>         CONFIG_CONSFRONT=n CONFIG_XC=n -j4
>>
>> Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@xxxxxxxxx>
>> [talex5@xxxxxxxxx: made x86_64 support work again]
>> [talex5@xxxxxxxxx: split into multiple patches]
>> [talex5@xxxxxxxxx: re-enabled force_evtchn_callback]
>> [talex5@xxxxxxxxx: enable regular console]
>> [talex5@xxxxxxxxx: fixed initialisation code:
>> - Configure write-back caching in page table. This is needed for
>>    reliable hypercalls to Xen (thanks to Julien Grall).
>> - Use "client mode" for access control (domains are deprecated,
>>    according to ARM Cortex-A Series Programmerâs Guide version 4.0,
>>    section 9.6.4).
>> - Enable more SCTLR features (icache, branch prediction)]
>> [talex5@xxxxxxxxx: use Virtual Count register for monotonic time]
>> [talex5@xxxxxxxxx: fixed HYPERVISOR_shutdown]
>> [talex5@xxxxxxxxx: get xenstore details from hypervisor]
>> [talex5@xxxxxxxxx: use GCC implementation of division]
>> [talex5@xxxxxxxxx: cleaned up interrupt handlers and threading]
>> [talex5@xxxxxxxxx: call exit_thread when a thread returns]
>> [talex5@xxxxxxxxx: implemented block_domain for ARM]
>> [talex5@xxxxxxxxx: fixed hang when enabling interrupts]
>> [talex5@xxxxxxxxx: added -march=armv7-a to flags]
>> [talex5@xxxxxxxxx: CLREX after handling IRQs]
>> [talex5@xxxxxxxxx: unbind debug port at shutdown]
>> [talex5@xxxxxxxxx: allow unaligned accesses]
>> [talex5@xxxxxxxxx: fix zImage header for XSA-95]
>> [talex5@xxxxxxxxx: get RAM base and size from the FDT]
>> [talex5@xxxxxxxxx: get GIC addresses from FDT]
>> [talex5@xxxxxxxxx: added ARM grant table initialisation]
>> [talex5@xxxxxxxxx: added missing copyright header to hypercalls32.S]
>> [talex5@xxxxxxxxx: moved GIC driver to arm directory]
>> [talex5@xxxxxxxxx: fixes suggested by Julien Grall:
>> - Removed unnecessary isb.
>> - Renamed GICD_PRIORITY to GICD_IPRIORITYR.
>> - Change IRQ number type from unsigned char to int.
>> - Added volatile to {set,clear}_bit_non_atomic.
>> - Fixed some comments.
>> - Check compatible properties in DTB.]
>> [talex5@xxxxxxxxx: made image relocatable]
>> [talex5@xxxxxxxxx: added mfn_to_pfn and pfn_to_mfn]
>
>
> IHMO all these changes doesn't belong to the commit message, i.e should not
> appear in the commit message when Ian will apply your commit message.

As I understand it, as I'm not the author of the patch, it's polite to
indicate how I have modified it. I'm trying to follow the guidelines
here:

"Rule (b) allows you to adjust the code, but then it is very impolite
to change one submitter's code and make him endorse your bugs. To
solve this problem, it is recommended that you add a line between the
last Signed-off-by header and yours, indicating the nature of your
changes."

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=4e8a2372f9255a1464ef488ed925455f53fbdaa1#n338

> I would move them in a changelog. You can separate the commit message and
> the changelog by --- and a newline. Everything after the --- will be removed
> by git am.
>
>> Signed-off-by: Thomas Leonard <talex5@xxxxxxxxx>
>> ---
>>   extras/mini-os/ARM-TODO.txt              |   6 +
>>   extras/mini-os/Config.mk                 |   2 +
>>   extras/mini-os/Makefile                  |   9 ++
>>   extras/mini-os/arch/arm/Makefile         |  32 ++++
>>   extras/mini-os/arch/arm/arch.mk          |   7 +
>>   extras/mini-os/arch/arm/arm32.S          | 266
>> +++++++++++++++++++++++++++++++
>>   extras/mini-os/arch/arm/events.c         |  30 ++++
>>   extras/mini-os/arch/arm/gic.c            | 222
>> ++++++++++++++++++++++++++
>>   extras/mini-os/arch/arm/hypercalls32.S   |  75 +++++++++
>>   extras/mini-os/arch/arm/minios-arm32.lds |  75 +++++++++
>>   extras/mini-os/arch/arm/mm.c             | 134 ++++++++++++++++
>>   extras/mini-os/arch/arm/sched.c          |  37 +++++
>>   extras/mini-os/arch/arm/setup.c          | 116 ++++++++++++++
>>   extras/mini-os/arch/arm/time.c           | 202 +++++++++++++++++++++++
>>   extras/mini-os/kernel.c                  |   2 +-
>>   15 files changed, 1214 insertions(+), 1 deletion(-)
>>   create mode 100644 extras/mini-os/ARM-TODO.txt
>>   create mode 100755 extras/mini-os/arch/arm/Makefile
>>   create mode 100644 extras/mini-os/arch/arm/arch.mk
>>   create mode 100644 extras/mini-os/arch/arm/arm32.S
>>   create mode 100644 extras/mini-os/arch/arm/events.c
>>   create mode 100644 extras/mini-os/arch/arm/gic.c
>>   create mode 100644 extras/mini-os/arch/arm/hypercalls32.S
>>   create mode 100755 extras/mini-os/arch/arm/minios-arm32.lds
>>   create mode 100644 extras/mini-os/arch/arm/mm.c
>>   create mode 100644 extras/mini-os/arch/arm/sched.c
>>   create mode 100644 extras/mini-os/arch/arm/setup.c
>>   create mode 100644 extras/mini-os/arch/arm/time.c
>>
>> diff --git a/extras/mini-os/Config.mk b/extras/mini-os/Config.mk
>> index d61877b..4ecde54 100644
>> --- a/extras/mini-os/Config.mk
>> +++ b/extras/mini-os/Config.mk
>> @@ -12,6 +12,8 @@ export XEN_INTERFACE_VERSION
>>   # If not x86 then use $(XEN_TARGET_ARCH)
>>   ifeq ($(findstring x86_,$(XEN_TARGET_ARCH)),x86_)
>>   TARGET_ARCH_FAM = x86
>> +else ifeq ($(findstring arm,$(XEN_TARGET_ARCH)),arm)
>> +TARGET_ARCH_FAM = arm
>>   else
>>   TARGET_ARCH_FAM = $(XEN_TARGET_ARCH)
>>   endif
>> diff --git a/extras/mini-os/Makefile b/extras/mini-os/Makefile
>> index 931cd05..01d8af0 100644
>> --- a/extras/mini-os/Makefile
>> +++ b/extras/mini-os/Makefile
>> @@ -78,6 +78,9 @@ TARGET := mini-os
>>   SUBDIRS := lib xenbus console
>>
>>   ifeq ($(XEN_TARGET_ARCH),arm32)
>> +# Need libgcc.a for division helpers
>> +LDLIBS += `$(CC) -print-libgcc-file-name`
>
>
> OOI, how much code does add libgcc for the division helpers?

Hard to say. libgcc.a contains many files, but the ones with "div" in
the name come to about 67K, so probably less than that.

>> diff --git a/extras/mini-os/arch/arm/Makefile
>> b/extras/mini-os/arch/arm/Makefile
>> new file mode 100755
>> index 0000000..8b78651
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/Makefile
>> @@ -0,0 +1,32 @@
>> +#
>> +# ARM architecture specific makefiles.
>> +#
>> +
>> +XEN_ROOT = $(CURDIR)/../../../..
>> +include $(XEN_ROOT)/Config.mk
>> +include ../../Config.mk
>> +
>> +# include arch.mk has to be before minios.mk!
>> +
>> +include arch.mk
>> +include ../../minios.mk
>> +
>> +# Sources here are all *.c (without $(XEN_TARGET_ARCH).S)
>> +# This is handled in $(HEAD_ARCH_OBJ)
>> +ARCH_SRCS := $(wildcard *.c)
>> +
>> +# The objects built from the sources.
>> +ARCH_OBJS := $(patsubst %.c,$(OBJ_DIR)/%.o,$(ARCH_SRCS))
>> +
>> +ARCH_OBJS += hypercalls32.o
>> +
>> +all: $(OBJ_DIR)/$(ARCH_LIB)
>> +
>> +# $(HEAD_ARCH_OBJ) is only built here, needed on linking
>> +# in ../../Makefile.
>> +$(OBJ_DIR)/$(ARCH_LIB): $(ARCH_OBJS) $(OBJ_DIR)/$(HEAD_ARCH_OBJ)
>> +       $(AR) rv $(OBJ_DIR)/$(ARCH_LIB) $(ARCH_OBJS)
>> +
>> +clean:
>> +       rm -f $(OBJ_DIR)/$(ARCH_LIB) $(ARCH_OBJS)
>> $(OBJ_DIR)/$(HEAD_ARCH_OBJ)
>> +
>> diff --git a/extras/mini-os/arch/arm/arch.mk
>> b/extras/mini-os/arch/arm/arch.mk
>> new file mode 100644
>> index 0000000..ab20d99
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/arch.mk
>> @@ -0,0 +1,7 @@
>> +ifeq ($(XEN_TARGET_ARCH),arm32)
>> +DEF_ASFLAGS += -march=armv7-a
>> +ARCH_CFLAGS  := -march=armv7-a -marm -fms-extensions -D__arm__
>> -DXEN_HAVE_PV_GUEST_ENTRY #-DCPU_EXCLUSIVE_LDST
>> +EXTRA_INC += $(TARGET_ARCH_FAM)/$(XEN_TARGET_ARCH)
>> +EXTRA_SRC += arch/$(EXTRA_INC)
>> +endif
>> +
>> diff --git a/extras/mini-os/arch/arm/arm32.S
>> b/extras/mini-os/arch/arm/arm32.S
>> new file mode 100644
>> index 0000000..de74ed9
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/arm32.S
>> @@ -0,0 +1,266 @@
>> +@ Virtual address of the start of RAM (any value will do, but it must be
>> +@ section-aligned). Update the lds script if changed.
>> +#define VIRT_BASE 0x400000
>> +
>> +@ Offset of the kernel within the RAM. This is a zImage convention which
>> we
>> +@ rely on.
>> +#define ZIMAGE_KERNEL_OFFSET 0x8000
>
>
> Hmmm... this is not a zImage convention... IIRC Linux is using this offset
> to have enough space to create startup page table during boot.

OK, so this is a convention of Linux only? I found this reference:

http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html

"Despite the ability to place zImage anywhere within memory,
convention has it that it is loaded at the base of physical RAM plus
an offset of 0x8000 (32K). This leaves space for the parameter block
usually placed at offset 0x100, zero page exception vectors and page
tables. This convention is *very* common."

> The Linux zImage is able to relocate itself in the memory to respect this
> convention. But the zImage itself can be loaded anywhere in the memory.
>
> By looking to your code below your are relying that the kernel will be
> loaded at 0xXXXX8000 which is incorrect. This offset is odd and make other
> kernel (such as FreeBSD) requiring the same trick which is not part of the
> Linux boot protocol.

Sorry, I don't understand this. You're saying that Xen's choice of
0x8000 forces FreeBSD to support this offset too? But that isn't
caused by anything in Mini-OS.

> I plan to send a patch to require the start address to be 2MB aligned, so
> kernels will be able to use 2MB (for LPAE) and 1MB section for there early
> page table.
>
> If the kernel wants another alignment, then you will have to relocate
> yourself. Even though in your case, you don't need this odd 0xXXXX8000.

It's very convenient to know at least the offset where we will be
loaded. Otherwise, we have to move the image around in memory with
loads and stores. I think there is some work going on to share images
between VMs (for running large numbers of VMs on one machine). I don't
know the details, but presumably it would be easier to support that if
the image's text section can be mapped read-only.

> [..]
>
>
>> +       @ Fill in the whole top-level translation table (at page_dir).
>> +       @ Populate the whole pagedir with 1MB section descriptors.
>> +
>> +       mov     r1, r7                  @ r1 -> first section entry
>> +       add     r3, r1, #4*4*1024       @ limit (4 GB address space, 4
>> byte entries)
>> +       orr     r0, r8, r9              @ r0 = entry mapping section zero
>> to start of physical RAM
>> +1:
>> +       str     r0, [r1],#4             @ write the section entry
>> +       add     r0, r0, #1 << 20        @ next physical page (wraps)
>> +       cmp     r1, r3
>> +       bne     1b
>
>
> I would document a bit more this part. It took me a bit of time to
> understand that you mapping the whole address space in an odd manner.

Could you suggest something? It's hard to know what other people will
find confusing. For reference, here's the existing comment in the code
where I tried to explain the scheme:

    @ Problem: the C code wants to be at a known address (VIRT_BASE),
but Xen might
    @ load us anywhere. We initialise the MMU (mapping virtual to
physical @ addresses)
    @ so everything ends up where the code expects it to be.
    @
    @ We calculate the offet between where the linker thought _start
would be and where
    @ it actually is and initialise the page tables to have that
offset for every page.
    @
    @ When we turn on the MMU, we're still executing at the old
address. We don't want
    @ the code to disappear from under us. So we have to do the
mapping in stages:
    @
    @ 1. set up a mapping to our current page from both its current
and desired addresses
    @ 2. enable the MMU
    @ 3. jump to the new address
    @ 4. remap all the other pages with the calculated offset

> While it's fine for a first version of Mini-OS support for ARM. It think at
> long term you want to map only necessary bank rank. It will be easier to
> catch programming error and avoid to trap in the hypervisor because the
> physical address doesn't exist.

Yes, this is just the boot code. Eventually, the C code should make
the .text read-only (and unmap pages that don't exist, although I
don't think there's currently any disadvantage to having the trap
caused by the second stage page tables). I also want to add guard
pages to the thread stacks once these initial patches are in.

> Futhermore, the data abort sent by the hypervisor is odd (debug smth).

Could you explain what you mean by "odd" here?

>> +.pushsection .data
>> +_data:
>> +.align 14
>> +.globl page_dir
>> +@ Each word maps one 1MB virtual section to a physical section
>> +page_dir:
>> +       .fill (4*1024), 4, 0x0
>> +
>> +.align 12
>> +.globl shared_info_page
>> +shared_info_page:
>> +       .fill (1024), 4, 0x0
>> +
>> +.align 3
>> +.globl stack
>> +stack:
>> +       .fill (4*1024), 4, 0x0
>> +stack_end:
>> +
>> +.align 3
>> +irqstack:
>> +       .fill (1024), 4, 0x0
>> +irqstack_end:
>> +
>> +.globl physical_address_offset
>> +physical_address_offset:
>> +       .long   0
>> +
>> +.popsection
>
>
> Any reason to define theses variables in ASM rather than C?

I normally prefer to define variables in the module that sets them,
but it doesn't make much difference to me.

Though perhaps they should be moved to the .bss section (except for
the page table, which could be moved to the start of RAM to simplify
things).

>> diff --git a/extras/mini-os/arch/arm/gic.c b/extras/mini-os/arch/arm/gic.c
>> new file mode 100644
>> index 0000000..5641eb0
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/gic.c
>> @@ -0,0 +1,222 @@
>> +// ARM GIC implementation
>> +
>> +#include <mini-os/os.h>
>> +#include <mini-os/hypervisor.h>
>> +#include <libfdt.h>
>> +
>> +//#define VGIC_DEBUG
>> +#ifdef VGIC_DEBUG
>> +#define DEBUG(_f, _a...) \
>> +    DEBUG("MINI_OS(file=vgic.c, line=%d) " _f , __LINE__, ## _a)
>
>
> Did you intend to use printk here?

It does look odd. I'll take a look.

> [..]
>
>
>> +void gic_init(void) {
>> +    gic.gicd_base = NULL;
>> +    int node = 0;
>> +    int depth = 0;
>> +    for (;;)
>> +    {
>> +        node = fdt_next_node(device_tree, node, &depth);
>> +        if (node <= 0 || depth < 0)
>> +            break;
>> +
>> +        if (fdt_getprop(device_tree, node, "interrupt-controller", NULL))
>> {
>> +            int len = 0;
>> +
>> +            if (fdt_node_check_compatible(device_tree, node,
>> "arm,cortex-a15-gic") &&
>> +                fdt_node_check_compatible(device_tree, node,
>> "arm,cortex-a9-gic") &&
>
>
> I don't think this compatible is necessary. Cortex A9 doesn't support
> virtualisation.

OK. I was just matching on the values that Xen provides. a15-gic
should be fine on its own, though.

>> +                fdt_node_check_compatible(device_tree, node,
>> "arm,cortex-a7-gic")) {
>> +                printk("Skipping incompatible interrupt-controller
>> node\n");
>> +                continue;
>> +            }
>> +
>> +            const uint64_t *reg = fdt_getprop(device_tree, node, "reg",
>> &len);
>> +            if (reg == NULL || len != 32) {
>
>
> As asked on the previous version, if you plan to assume specific range size
> for the time-being, please explain the 32.

OK, so we have two registers (GICC and GICD), each of which contains
two parts (an address and a size), each of which is a 64-bit value (8
bytes). So 2 * 2 * 8 = 32. Should I check this as a minimum, or
require it to match exactly?

>> +                /* TODO: support other formats */
>> +                printk("Bad 'reg' property: %p %d\n", reg, len);
>> +                continue;
>> +            }
>> +            gic.gicd_base = to_virt((long) fdt64_to_cpu(reg[0]));
>> +            gic.gicc_base = to_virt((long) fdt64_to_cpu(reg[2]));
>
>
> AFAIU, your are mapping the GIC region with normal attribute. With this
> attribute, the processor may reorder the write in the device memory, preload
> memory, caching.... Which is completely wrong in this case. I'm sa bit
> surprised that it works correctly. You have to map thoses regions as device
> memory.

I believe that the ARM uses the most restrictive memory type from the
two stages of the translation table, so if Xen maps it as device
memory then it will work. We can set it here as well, though.

(see "Overlaying the memory type attribute" in the Architecture
Reference Manual)

> [..]
>
>
>> diff --git a/extras/mini-os/arch/arm/hypercalls32.S
>> b/extras/mini-os/arch/arm/hypercalls32.S
>> new file mode 100644
>> index 0000000..0d7662d
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/hypercalls32.S
>
>
> [..]
>
>
>> +#define __HYPERVISOR_memory_op            12
>> +#define __HYPERVISOR_xen_version          17
>> +#define __HYPERVISOR_console_io           18
>> +#define __HYPERVISOR_grant_table_op       20
>> +#define __HYPERVISOR_vcpu_op              24
>> +#define __HYPERVISOR_xsm_op               27
>> +#define __HYPERVISOR_sched_op             29
>> +#define __HYPERVISOR_event_channel_op     32
>> +#define __HYPERVISOR_physdev_op           33
>> +#define __HYPERVISOR_hvm_op               34
>> +#define __HYPERVISOR_sysctl               35
>> +#define __HYPERVISOR_domctl               36
>
>
> Hmmm... why do you hardcode those numbers here? Can't you include the
> correct header?

Good idea.

> [..]
>
>
>> diff --git a/extras/mini-os/arch/arm/minios-arm32.lds
>> b/extras/mini-os/arch/arm/minios-arm32.lds
>> new file mode 100755
>> index 0000000..b18ca55
>> --- /dev/null
>> +++ b/extras/mini-os/arch/arm/minios-arm32.lds
>> @@ -0,0 +1,75 @@
>> +OUTPUT_ARCH(arm)
>> +ENTRY(_start)
>> +SECTIONS
>> +{
>> +  . = 0x408000;
>
>
> I would use the preprocessor to avoid hardcoding address here. It will help
> later if someone wants to change the virtual address of the kernel.

OK.

> [..]
>
>> diff --git a/extras/mini-os/arch/arm/mm.c b/extras/mini-os/arch/arm/mm.c
>
>
>
> [..]
>
>
>> +void arch_init_mm(unsigned long *start_pfn_p, unsigned long *max_pfn_p)
>
>
> [..]
>
>
>> +
>> +    /* Xen will always provide us at least one bank of memory.
>> +     * Mini-OS will use the first bank for the time-being. */
>> +    regs = fdt_getprop(device_tree, memory, "reg", &prop_len);
>> +    if (regs == NULL || prop_len != 16) {
>
>
> Same remark as in gic_init. Please explain where does come from the 16.

OK.

> [..]
>
>
>> +        /* TODO: support other formats */
>> +        printk("Bad 'reg' property: %p %d\n", regs, prop_len);
>> +        BUG();
>> +    }
>> +
>> +    unsigned int end = (unsigned int) &_end;
>> +    unsigned int mem_base = fdt64_to_cpu(regs[0]);
>> +    unsigned int mem_size = fdt64_to_cpu(regs[1]);
>
>
> In general if you have to cast/create a variable to contain a physical
> address. Please make create a type paddr_t and use it.
>
> When someone will implement arm64 support it will have less trouble because
> we cast correctly cast the value.

Good idea.

> IIRC, there is few other place where it's the same problem.
>
>
>> +/* Get Xen's suggested physical page assignments for the grant table. */
>> +static unsigned long get_gnttab_base(void)
>> +{
>> +    int hypervisor;
>> +    int len = 0;
>> +    const uint64_t *regs;
>> +    unsigned int gnttab_base;
>> +
>> +    hypervisor = fdt_node_offset_by_compatible(device_tree, -1,
>> "xen,xen");
>> +    BUG_ON(hypervisor < 0);
>> +
>> +    regs = fdt_getprop(device_tree, hypervisor, "reg", &len);
>> +    if (regs == NULL || len != 16) {
>> +        /* TODO: support other formats */
>
>
> Please explain why the 16.

OK.

>> +        printk("Bad 'reg' property: %p %d\n", regs, len);
>> +        BUG();
>> +    }
>> +
>> +    gnttab_base = fdt64_to_cpu(regs[0]);
>
>
> This look odd... you are using unsigned int for the variable but the
> function is return unsigned long.

OK.

>> +    printk("FDT suggests grant table base %lx\n", gnttab_base);
>> +
>> +    return gnttab_base;
>> +}
>
>
> [..]
>
> Regards,
>
> --
> Julien Grall

Thanks,


-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.