[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 5/8] xen/riscv: introduce identity mapping



On Mon, 2023-06-12 at 15:48 +0200, Jan Beulich wrote:
> On 06.06.2023 21:55, Oleksii Kurochko wrote:
> > The way how switch to virtual address was implemented in the
> > commit e66003e7be ("xen/riscv: introduce setup_initial_pages")
> > wasn't safe enough so identity mapping was introduced and
> > used.
> 
> I don't think this is sufficient as a description. You want to make
> clear what the "not safe enough" is, and you also want to go into
> more detail as to the solution chosen. I'm particularly puzzled that
> you map just two singular pages ...
I'll update a descrption.

I map two pages as it likely to be enough to switch from 1:1 mapping
world. My point is that we need 1:1 mapping only for few instructions
which are located in start() [ in .text.header section ]:

        li      t0, XEN_VIRT_START
        la      t1, start
        sub     t1, t1, t0

        /* Calculate proper VA after jump from 1:1 mapping */
        la      t0, .L_primary_switched
        sub     t0, t0, t1

        /* Jump from 1:1 mapping world */
        jr      t0

And it is needed to map 1:1 cpu0_boot_stack to not to fault when
executing epilog of enable_mmu() function as it accesses a value on the
stack:
    ffffffffc00001b0:       6422                    ld      s0,8(sp)
    ffffffffc00001b2:       0141                    addi    sp,sp,16
    ffffffffc00001b4:       8082                    ret

> 
> > @@ -35,8 +40,10 @@ static unsigned long phys_offset;
> >   *
> >   * It might be needed one more page table in case when Xen load
> > address
> >   * isn't 2 MB aligned.
> > + *
> > + * 3 additional page tables are needed for identity mapping.
> >   */
> > -#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1)
> > +#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1 + 3)
> 
> What is this 3 coming from? It feels like the value should (again)
> somehow depend on CONFIG_PAGING_LEVELS.
3 is too much in the current case. It should be 2 more.

The linker address is 0xFFFFFFFC00000000 thereby mapping the linker to
load addresses
we need 3 pages ( with the condition that the size of Xen won't be
larger than 2 MB ) and 1 page if the case when Xen load address isn't
2MV aligned.

We need 2 more page tables because Xen is loaded to address 0x80200000
by OpenSBI and the load address isn't equal to the linker address so we
need additional 2 pages as the L2 table we already allocated when
mapping the linker to load addresses.

And it looks like 2 is enough for Sv48, Sv57 as L4, L3 and L2
pagetables will be already allocated during mapping the linker to load
addresses.

And it only will be too much for Sv32 ( which has only L1, L0 in
general ) but I am not sure that anyone cares about Sv32.

> 
> > @@ -108,16 +116,18 @@ static void __init
> > setup_initial_mapping(struct mmu_desc *mmu_desc,
> >              {
> >                  unsigned long paddr = (page_addr - map_start) +
> > pa_start;
> >                  unsigned int permissions = PTE_LEAF_DEFAULT;
> > +                unsigned long addr = (is_identity_mapping) ?
> 
> Nit: No need for parentheses here.
Thanks.

> 
> > +                                     page_addr :
> > LINK_TO_LOAD(page_addr);
> 
> As a remark, while we want binary operators at the end of lines when
> wrapping, we usually do things differently for the ternary operator:
> Either
> 
>                 unsigned long addr = is_identity_mapping
>                                      ? page_addr :
> LINK_TO_LOAD(page_addr);
> 
> or
> 
>                 unsigned long addr = is_identity_mapping
>                                      ? page_addr
>                                      : LINK_TO_LOAD(page_addr);
It looks better. Thanks.
> 
> .
> 
> > @@ -232,22 +242,27 @@ void __init setup_initial_pagetables(void)
> >                            linker_start,
> >                            linker_end,
> >                            load_start);
> > +
> > +    if ( linker_start == load_start )
> > +        return;
> > +
> > +    setup_initial_mapping(&mmu_desc,
> > +                          load_start,
> > +                          load_start + PAGE_SIZE,
> > +                          load_start);
> > +
> > +    setup_initial_mapping(&mmu_desc,
> > +                          (unsigned long)cpu0_boot_stack,
> > +                          (unsigned long)cpu0_boot_stack +
> > PAGE_SIZE,
> 
> Shouldn't this be STACK_SIZE (and then also be prepared for
> STACK_SIZE > PAGE_SIZE)?
Yes, it will be better to use STACK_SIZE but for 1:1 mapping it will be
enough PAGE_SIZE as I mentioned above this only need for epilogue of
enable_mmu() function.
> 
> > +                          (unsigned long)cpu0_boot_stack);
> >  }
> >  
> > -void __init noreturn noinline enable_mmu()
> > +/*
> > + * enable_mmu() can't be __init because __init section isn't part
> > of identity
> > + * mapping so it will cause an issue after MMU will be enabled.
> > + */
> 
> As hinted at above already - perhaps the identity mapping wants to be
> larger, up to covering the entire Xen image? Since it's temporary
> only anyway, you could even consider using a large page (and RWX
> permission). You already require no overlap of link and load
> addresses,
> so at least small page mappings ought to be possible for the entire
> image.
It makes sense and started to thought about that after I applied the
patch for Dom0 running... I think we can map 1 GB page which will cover
the whole Xen image. Also in that case we haven't to allocate 2 more
page tables.

> 
> > @@ -255,25 +270,41 @@ void __init noreturn noinline enable_mmu()
> >      csr_write(CSR_SATP,
> >                PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> >                RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > +}
> > +
> > +void __init remove_identity_mapping(void)
> > +{
> > +    int i, j;
> 
> Nit: unsigned int please.
Thanks.

> 
> > +    pte_t *pgtbl;
> > +    unsigned int index, xen_index;
> 
> These would all probably better be declared in the narrowest possible
> scope.
Sure.

> 
> > -    asm volatile ( ".p2align 2" );
> > - mmu_is_enabled:
> >      /*
> > -     * Stack should be re-inited as:
> > -     * 1. Right now an address of the stack is relative to load
> > time
> > -     *    addresses what will cause an issue in case of load start
> > address
> > -     *    isn't equal to linker start address.
> > -     * 2. Addresses in stack are all load time relative which can
> > be an
> > -     *    issue in case when load start address isn't equal to
> > linker
> > -     *    start address.
> > -     *
> > -     * We can't return to the caller because the stack was reseted
> > -     * and it may have stash some variable on the stack.
> > -     * Jump to a brand new function as the stack was reseted
> > +     * id_addrs should be in sync with id mapping in
> > +     * setup_initial_pagetables()
> 
> What is "id" meant to stand for here? Also if things need keeping in
> sync, then a similar comment should exist on the other side.
"id" here mean identity.

> 
> >       */
> > +    unsigned long id_addrs[] =  {
> > +                                 LINK_TO_LOAD(_start),
> > +                                 LINK_TO_LOAD(cpu0_boot_stack),
> > +                                };
> >  
> > -    switch_stack_and_jump((unsigned long)cpu0_boot_stack +
> > STACK_SIZE,
> > -                          cont_after_mmu_is_enabled);
> > +    pgtbl = stage1_pgtbl_root;
> > +
> > +    for ( j = 0; j < ARRAY_SIZE(id_addrs); j++ )
> > +    {
> > +        for ( pgtbl = stage1_pgtbl_root, i = CONFIG_PAGING_LEVELS
> > - 1; i >= 0; i-- )
> > +        {
> > +            index = pt_index(i, id_addrs[j]);
> > +            xen_index = pt_index(i, XEN_VIRT_START);
> > +
> > +            if ( index != xen_index )
> > +            {
> > +                pgtbl[index].pte = 0;
> > +                break;
> > +            }
> 
> Up to here I understand index specifies a particular PTE within
> pgtbl[].
> 
> > +            pgtbl = &pgtbl[index];
> 
> But then how can this be the continuation of the page table walk?
> Don't
> you need to read the address out of the PTE?
You are right. it should be:
   pgtbl = (pte_t *)pte_to_paddr(pgtbl[index]); 

> 
> > --- a/xen/arch/riscv/riscv64/head.S
> > +++ b/xen/arch/riscv/riscv64/head.S
> > @@ -31,6 +31,36 @@ ENTRY(start)
> >  
> >          jal     calc_phys_offset
> >  
> > +        jal     setup_initial_pagetables
> > +
> > +        jal     enable_mmu
> > +
> > +        /*
> > +         * Calculate physical offset
> > +         *
> > +         * We can't re-use a value in phys_offset variable here as
> > +         * phys_offset is located in .bss and this section isn't
> > +         * 1:1 mapped and an access to it will cause MMU fault
> > +         */
> > +        li      t0, XEN_VIRT_START
> > +        la      t1, start
> > +        sub     t1, t1, t0
> 
> If I'm not mistaken this calculates start - XEN_VIRT_START, and ...
> 
> > +        /* Calculate proper VA after jump from 1:1 mapping */
> > +        la      t0, .L_primary_switched
> > +        sub     t0, t0, t1
> 
> ... then this .L_primary_switched - (start - XEN_VIRT_START). Which
> can
> also be expressed as (.L_primary_switched - start) + XEN_VIRT_START,
> the first part of which gas ought to be able to resolve for you. But
> upon experimenting a little it looks like it can't. (I had thought of
> something along the lines of
> 
>         li      t0, .L_primary_switched - start
>         li      t1, XEN_VIRT_START
>         add     t0, t0, t1
> 
> or even
> 
>         li      t1, XEN_VIRT_START
>         add     t0, t1, %pcrel_lo(.L_primary_switched - start)
> 
> .)
Calculation of ".L_primary_switched - start" might be an issue for gcc.
I tried to do something similar and recieved or "Error: can't resolve
.L_primary_switched - start" or "Error: illegal operands `li
t0,.L_primary_switched-start'"

~ Oleksii

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.