[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages



On Thu, 2023-03-23 at 11:57 +0000, Julien Grall wrote:
> On 23/03/2023 11:18, Oleksii wrote:
> > Hi Julien,
> 
> Hi Oleksii,
> 
> > On Wed, 2023-03-22 at 14:21 +0000, Julien Grall wrote:
> > ...
> > > 
> > > > > > +    unsigned long page_addr;
> > > > > > +
> > > > > > +    // /* align start addresses */
> > > > > > +    // map_start &= XEN_PT_LEVEL_MAP_MASK(0);
> > > > > > +    // pa_start &= XEN_PT_LEVEL_MAP_MASK(0);
> > > > > 
> > > > > They should be switched to ASSERT() or BUG_ON().
> > > > Sure. Thanks. I'll update.
> > > > > 
> > > > > > +
> > > > > > +    page_addr = map_start;
> > > > > > +    while ( page_addr < map_end )
> > > > > 
> > > > > This loop is continue to assume that only the mapping can
> > > > > first
> > > > > in
> > > > > 2MB
> > > > > section (or less if the start is not 2MB aligned).
> > > > > 
> > > > > I am OK if you want to assume it, but I think this should be
> > > > > documented
> > > > > (with words and ASSERT()/BUG_ON()) to avoid any mistake.
> > > > I add a check in setup_initial_pagetables:
> > > >        BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) !=
> > > > 0);
> > > > Probably this is not a correct place and should be moved to
> > > > setup_initial_mapping() instead of setup_initial_pagetables()
> > > 
> > > Yes it should be moved in setup_initial_mapping().
> > Then I'll moved it to setup_initial_mapping()
> > > 
> > > > > 
> > > > > > +    {
> > > > > > +        unsigned long index2 = pt_index(2, page_addr);
> > > > > > +        unsigned long index1 = pt_index(1, page_addr);
> > > > > > +        unsigned long index0 = pt_index(0, page_addr);
> > > > > > +
> > > > > > +        /* Setup level2 table */
> > > > > > +        second[index2] = paddr_to_pte((unsigned
> > > > > > long)first);
> > > > > > +        second[index2].pte |= PTE_TABLE;
> > > > > > +
> > > > > > +        /* Setup level1 table */
> > > > > > +        first[index1] = paddr_to_pte((unsigned
> > > > > > long)zeroeth);
> > > > > > +        first[index1].pte |= PTE_TABLE;
> > > > > > +
> > > > > > +
> > > > > 
> > > > > NIT: Spurious line?
> > > > Yeah, should be removed. Thanks.
> > > > > 
> > > > > > +        /* Setup level0 table */
> > > > > > +        if ( !pte_is_valid(&zeroeth[index0]) )
> > > > > 
> > > > > On the previous version, you said it should be checked for
> > > > > each
> > > > > level.
> > > > I had a terrible internet connection, and my message wasn't
> > > > sent.
> > > 
> > > No worries.
> > > 
> > > > 
> > > > I decided not to check that l2 and l1 are used only for
> > > > referring
> > > > to
> > > > the next page table but not leaf PTE. So it is safe to
> > > > overwrite it
> > > > each time (the addresses of page tables are the same all the
> > > > time)
> > > 
> > > You are letting the caller to decide which page-table to use for
> > > each
> > > level. So you are at the mercy that caller will do the right
> > > thing.
> > > 
> > > IHMO, this is a pretty bad idea because debugging page-tables
> > > error
> > > are
> > > difficult. So it is better to have safety in place. This is not
> > > worth...
> > > 
> > >    and
> > > > probably it will be better from optimization point of view to
> > > > ignore if
> > > > clauses.
> > > 
> > > ... the optimization in particular when this is at boot time.
> > I didn't think about that caller will do always the right thing so
> > I will add the check.
> > > 
> > > > 
> > > > And it is needed in case of L0 because it is possible that some
> > > > addressed were marked with specific flag ( execution, read,
> > > > write )
> > > > and
> > > > so not to overwrite the flags set before the check is needed.
> > > In which case you should really report an error because the
> > > caller
> > > may
> > > have decide to set execution flag and you don't honor. So when
> > > the
> > > code
> > > is executed, you will receive a fault and this may be hard to
> > > find
> > > out
> > > what happen.
> > 
> > Right now, it is expected situation that the caller will try to
> > change
> > execution flag during the setup of initial page tables. >
> > It is possible in the currently implemented logic of the setup of
> > initial page tables.
> 
> This sounds like a bug in your caller implementation. You should not
> try 
> to workaround this in your code updating the page-tables.
> 
> > Let me explain what I mean.
> > 
> > The first step of setup_initial_pagetables() is to map .text,
> > .init,
> > .rodata with necessary flags RX, RX, R.
> > Remaining sections will have RW flags, and to map them,
> > setup_initial_mapping() is called for the whole range of
> > [linker_start,
> > linker_end] not to map them one by one thereby during this step
> > setup_initial_mapping() will try to remap addresses ranges which
> > overlap with .text, .init, .rodata with RW flags but it should
> > leave
> > with the previously set flags.
> Why do you need to call setup_init_mapping() with the whole range? In
> fact the only reason I can think this is useful is when you when to 
> create a 1:1 mapping when the linker and load address is different.
It is needed to not map each section separately one by one as most of
the sections have the same PTE_FLAGS (Read, Write, eXectuable, etc )

So it was decided to map the following sections separately as they have
'unique' flags:
 - .text -> RX
 - .rodata -> R
 - .init.text -> RX

All other sections are RW and could be covered by one call of
setup_init_mapping() for the whole range:
 - .data.ro_after_init
 - .data.read_mostly 
 - .data 
 - .init.data 
 - .bss
So some ranges ( .text, .rodata, .init.text ) from the whole range will
be skipped as already mapped and the rest sections will be mapped
during one call of setup_init_mapping().

> But...
> 
> > > > > > +
> > > > > > +    /* Get the addresses where the page tables were loaded
> > > > > > */
> > > > > > +    second  = (pte_t *)(&xen_second_pagetable);
> > > > > > +    first   = (pte_t *)(&xen_first_pagetable);
> > > > > > +    zeroeth = (pte_t *)(&xen_zeroeth_pagetable);
> > > > > > +
> > > > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&_stext),
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&_etext),
> > > > > > +                          (unsigned long)&_stext,
> > > > > > +                          PTE_LEAF_DEFAULT);
> > > > > > +
> > > > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&__init_begin),
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&__init_end),
> > > > > > +                          (unsigned long)&__init_begin,
> > > > > > +                          PTE_LEAF_DEFAULT |
> > > > > > PTE_WRITABLE);
> > > > > > +
> > > > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&_srodata),
> > > > > > +                          LOAD_TO_LINK((unsigned
> > > > > > long)&_erodata),
> > > > > > +                          (unsigned long)(&_srodata),
> > > > > > +                          PTE_VALID | PTE_READABLE);
> > > > > > +
> > > > > > +    setup_initial_mapping(second, first, zeroeth,
> > > > > > +                          linker_addr_start,
> > > > > > +                          linker_addr_end,
> > > > > > +                          load_addr_start,
> > > > > > +                          PTE_LEAF_DEFAULT |
> > > > > > PTE_READABLE);
> > > 
> > > ... this is not cover above. AFAIU, this is the one for the 1:1
> > > mapping.
> > But there is no 1:1 mapping.
> > I thought that 1:1 mapping is when the physical address is equal to
> > 0x10020 then after MMU is enabled the virtual address will be the
> > same
> > 0x10020 and mapped to 0x10020.
> > 
> > Probably I missed something but this code maps all linker addresses
> > to correspondent load addresses and it will be 1:1 only in case
> > when
> > load address is equal to linker address.
> 
> ... here you say this is not the purpose.
> 
> Also, if you don't intend to deal with load != link address yet, then
> the following BUG_ON() needs to be updated:
> 
> +    if (load_addr_start != linker_addr_start)
> +        BUG_ON((linker_addr_end > load_addr_start && load_addr_end >
> linker_addr_start));
I think that I'll cover it in the new patch series as I clarified all
necessary information and I don't expect that it will be too hard to
add from the start.
> 
> > > 
> > > > > 
> > > > > As I said in v1, you need to use a different set of page-
> > > > > table
> > > > > here.
> > > > If I understand you correctly I have to use a different set of
> > > > page-
> > > > table in case when it is possible that size of Xen will be
> > > > larger
> > > > then
> > > > PAGE_SIZE. So I added to xen.lds.S a check to be sure that the
> > > > size
> > > > fits into PAGE_SIZE.
> > > 
> > > This is not what I was referring to. I was pointing out that
> > > second,
> > > first, zeroeth are exactly the same for all the callers. You want
> > > to
> > > introduce a second set of zeroeth table. You will want to do the
> > > same
> > > for first but it is not always used.
> > > 
> > > Otherwise, this is not going to work if Xen is loaded at a
> > > different
> > > address than the runtime.
> > Ok.
> > 
> > I understand what do you mean in general but still I am not sure
> > that I
> > understand a particular case when I am sure that the size of Xen is
> > no
> > bigger then 2MB and load address is aligned on 2Mb boundary.
> > 
> > The size of one l0 page table is 512 (1 << 9 ) entries which covers
> > 4K
> > (1 << 12) * 512 = 2 Mb of memory so it should be fine.
> > All the callers in my case are trying to map addresses from
> > [linker_start, linker_end] which is not bigger then 2 MB and is
> > aligned
> > on 2 MB boundary.
> 
> I interpreted that your last call was meant to be for the 1:1 mapping
> when load != link address. It looks like I was wrong, so the same 
> page-table should be OK.
Thanks. Now we are on the same page.

But actually I like an idea to have more page tables and remove a
limitation to be aligned on 2MB boundary from the start.

I experimented with adding of additional table ( ( if it is necessary )
so it shouldn't be hard to add that code too.
> 
> > 
> > > 
> > > That said, when I spoke with Andrew yesterday, he mentioned that
> > > your
> > > initial goal is to support the case where Xen is loaded at the
> > > runtime
> > > address. I understand this simplifies a lot the code and I told
> > > him I
> > > was OK with that. However, it would be good to document what are
> > > your
> > > goals in each series (this is not always clear what you skip on
> > > purpose).
> > > 
> > > > 
> > > > > Also, where do you guarantee that Xen will be loaded at a 2MB
> > > > > aligned
> > > > > address? (For a fact I know that UEFI is only guarantee 4KB
> > > > > alignment).
> > > > There is a check in setup_initial_pagetables:
> > > >        BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) !=
> > > > 0);
> > > 
> > > This is not very obvious the check is to confirm that Xen is
> > > probably
> > > aligned. I would suggest to add a comment.
> > > 
> > > Also, you might want to use XEN_PT_LEVEL_SIZE(..) to make it more
> > > obvious what sort of alignment you are trying to enforce.
> > I will update add the comment and use XEN_PT_LEVEL_SIZE(...)
> > instead.
> > > 
> > > > > > +        la      sp, cpu0_boot_stack
> > > > > > +        li      t0, STACK_SIZE
> > > > > > +        add     sp, sp, t0
> > > > > > +
> > > > > > +        /*
> > > > > > +         * Re-init an address of exception handler as it
> > > > > > was
> > > > > > overwritten  with
> > > > > > +         * the address of the .L_mmu_is_enabled label.
> > > > > > +         * Before jump to trap_init save return address of
> > > > > > enable_mmu() to
> > > > > > +         * know where we should back after enable_mmu()
> > > > > > will
> > > > > > be
> > > > > > finished.
> > > > > > +         */
> > > > > > +        mv      s0, ra
> > > > > > +        lla     t0, trap_init
> > > > > > +        jalr    ra, t0
> > > > > > +
> > > > > > +        /*
> > > > > > +         * Re-calculate the return address of enable_mmu()
> > > > > > function for case
> > > > > > +         * when linker start address isn't equal to load
> > > > > > start
> > > > > > address
> > > > > > +         */
> > > > > > +        add     s0, s0, t1
> > > > > > +        mv      ra, s0
> > > > > > +
> > > > > > +        ret
> > > > > 
> > > > > Missing ENDPROC?
> > > > I haven't seen the usage of ENDPROC for RISC-V so it looks like
> > > > it
> > > > is
> > > > not necessary.
> > > 
> > > Ok. Would the objdump be able to report the function properly
> > > then? I
> > > know that on Arm, it was necessary report assembly function
> > > properly.
> > It is fine for RISC-V:
> > 
> > Disassembly of section .text:
> > 
> > 0000000000200000 <_start>:
> >    200000:       10401073                csrw    sie,zero
> >    200004:       6299                    lui     t0,0x6
> > ...
> >    20003c:       00000013                nop
> > 
> > 0000000000200040 <enable_mmu>:
> >    200040:       0003a297                auipc   t0,0x3a
> > ...
> >    20009e:       941a                    add     s0,s0,t1
> >    2000a0:       80a2                    mv      ra,s0
> >    2000a2:       8082                    ret
> >          ...
> > 
> > 00000000002000b0 <__bitmap_empty>:
> >    2000b0:       1141                    addi    sp,sp,-16
> >    2000b2:       e422                    sd      s0,8(sp)
> >    2000b4:       0800                    addi    s0,sp,16
> > ...
> 
> Perfect thanks!
> 
~ Oleksii




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.