[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 2/4] xen/riscv: introduce setup_initial_pages



On Thu, 2023-04-20 at 16:36 +0200, Jan Beulich wrote:
> On 19.04.2023 17:42, Oleksii Kurochko wrote:
> > --- a/xen/arch/riscv/include/asm/page-bits.h
> > +++ b/xen/arch/riscv/include/asm/page-bits.h
> > @@ -1,6 +1,16 @@
> >  #ifndef __RISCV_PAGE_BITS_H__
> >  #define __RISCV_PAGE_BITS_H__
> >  
> > +#ifdef CONFIG_RISCV_64
> > +#define PAGETABLE_ORDER         (9)
> > +#else /* CONFIG_RISCV_32 */
> > +#define PAGETABLE_ORDER         (10)
> > +#endif
> > +
> > +#define PAGETABLE_ENTRIES       (1 << PAGETABLE_ORDER)
> > +
> > +#define PTE_PPN_SHIFT           10
> > +
> >  #define PAGE_SHIFT              12 /* 4 KiB Pages */
> >  #define PADDR_BITS              56 /* 44-bit PPN */
> 
> Personally I think these two would better remain at the top. But
> maybe
> that's just me ...
I'm ok to remain them at the top.

> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/page.h
> > @@ -0,0 +1,63 @@
> > +#ifndef _ASM_RISCV_PAGE_H
> > +#define _ASM_RISCV_PAGE_H
> > +
> > +#include <xen/const.h>
> > +#include <xen/types.h>
> > +
> > +#define VPN_MASK                    ((unsigned
> > long)(PAGETABLE_ENTRIES - 1))
> > +
> > +#define XEN_PT_LEVEL_ORDER(lvl)     ((lvl) * PAGETABLE_ORDER)
> > +#define XEN_PT_LEVEL_SHIFT(lvl)     (XEN_PT_LEVEL_ORDER(lvl) +
> > PAGE_SHIFT)
> > +#define XEN_PT_LEVEL_SIZE(lvl)      (_AT(paddr_t, 1) <<
> > XEN_PT_LEVEL_SHIFT(lvl))
> > +#define XEN_PT_LEVEL_MAP_MASK(lvl)  (~(XEN_PT_LEVEL_SIZE(lvl) -
> > 1))
> > +#define XEN_PT_LEVEL_MASK(lvl)      (VPN_MASK <<
> > XEN_PT_LEVEL_SHIFT(lvl))
> > +
> > +#define PTE_VALID                   BIT(0, UL)
> > +#define PTE_READABLE                BIT(1, UL)
> > +#define PTE_WRITABLE                BIT(2, UL)
> > +#define PTE_EXECUTABLE              BIT(3, UL)
> > +#define PTE_USER                    BIT(4, UL)
> > +#define PTE_GLOBAL                  BIT(5, UL)
> > +#define PTE_ACCESSED                BIT(6, UL)
> > +#define PTE_DIRTY                   BIT(7, UL)
> > +#define PTE_RSW                     (BIT(8, UL) | BIT(9, UL))
> > +
> > +#define PTE_LEAF_DEFAULT            (PTE_VALID | PTE_READABLE |
> > PTE_WRITABLE)
> > +#define PTE_TABLE                   (PTE_VALID)
> > +
> > +/* Calculate the offsets into the pagetables for a given VA */
> > +#define pt_linear_offset(lvl, va)   ((va) >>
> > XEN_PT_LEVEL_SHIFT(lvl))
> > +
> > +#define pt_index(lvl, va) pt_linear_offset(lvl, (va) &
> > XEN_PT_LEVEL_MASK(lvl))
> > +
> > +/* Page Table entry */
> > +typedef struct {
> > +#ifdef CONFIG_RISCV_64
> > +    uint64_t pte;
> > +#else
> > +    uint32_t pte;
> > +#endif
> > +} pte_t;
> > +
> > +#define pte_to_addr(x) (((x) >> PTE_PPN_SHIFT) << PAGE_SHIFT)
> 
> This will be at risk of overflow for RV32 without a cast to paddr_t
> (which
> I expect would be a 64-bit type on RV32 just like it is on RV64).
Yeah, paddr_t is u64 for both RV32 and RV64.
I'll add cast to paddr_t to avoid possible risk of overflow.

> 
> > +#define addr_to_pte(x) (((x) >> PAGE_SHIFT) << PTE_PPN_SHIFT)
> > +
> > +static inline pte_t paddr_to_pte(const paddr_t paddr,
> > +                                 const unsigned long permissions)
> > +{
> > +    unsigned long tmp = addr_to_pte(paddr);
> > +    return (pte_t) { .pte = tmp | permissions };
> > +}
> > +
> > +static inline paddr_t pte_to_paddr(const pte_t pte)
> > +{
> > +    return pte_to_addr(pte.pte);
> > +}
> > +
> > +static inline bool pte_is_valid(const pte_t p)
> > +{
> > +    return p.pte & PTE_VALID;
> > +}
> 
> A remark on all of the "const" here: It's fine if you want to keep
> them,
> but generally we care about const-correctness only for pointed-to
> types.
> Scalar and compound parameter values are owned by called function
> anyway,
> so all the "const" achieves is that the function can't alter its own
> argument(s).
Then it doesn't make for the current case and removing them will
simplify paddr_to_pte function. So I prefer to remove them.

> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/mm.c
> > @@ -0,0 +1,319 @@
> > +#include <xen/compiler.h>
> > +#include <xen/init.h>
> > +#include <xen/kernel.h>
> > +
> > +#include <asm/early_printk.h>
> > +#include <asm/config.h>
> > +#include <asm/csr.h>
> > +#include <asm/mm.h>
> > +#include <asm/page.h>
> > +#include <asm/processor.h>
> > +
> > +struct mmu_desc {
> > +    unsigned long num_levels;
> 
> Isn't "unsigned int" sufficient here?
Yes, it will be sufficient.

> 
> > +/*
> > + * It is expected that Xen won't be more then 2 MB.
> > + * The check in xen.lds.S guarantees that.
> > + * At least 4 page (in case when Sv48 or Sv57 are used )
> > + * tables are needed to cover 2 MB. One for each page level
> > + * table with PAGE_SIZE = 4 Kb
> > + *
> > + * One L0 page table can cover 2 MB
> > + * (512 entries of one page table * PAGE_SIZE).
> > + *
> > + * It might be needed one more page table in case when
> > + * Xen load address isn't 2 MB aligned.
> > + *
> > + */
> > +#define PGTBL_INITIAL_COUNT     (5)
> 
> On x86 we have a CONFIG_PAGING_LEVELS constant. If you had something
> like this as well, this 5 would better match the comment as
> ((CONFIG_PAGING_LEVELS - 1) + 1), avoiding to make space for the two
> levels you won't need as long as only Sv39 is really meant to be
> used.
Thanks for note. I'll use CONFIG_PAGING_LEVELS too.

> 
> > +#define PGTBL_ENTRY_AMOUNT  (PAGE_SIZE / sizeof(pte_t))
> > +
> > +pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > +stage1_pgtbl_root[PGTBL_ENTRY_AMOUNT];
> > +
> > +pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > +stage1_pgtbl_nonroot[PGTBL_INITIAL_COUNT * PGTBL_ENTRY_AMOUNT];
> > 
> > +#define
> > HANDLE_PGTBL(curr_lvl_num)                                         
> > \
> > +    index = pt_index(curr_lvl_num,
> > page_addr);                              \
> > +    if ( pte_is_valid(pgtbl[index])
> > )                                       \
> > +   
> > {                                                                  
> >      \
> > +        /* Find L{ 0-3 } table
> > */                                           \
> > +        pgtbl = (pte_t
> > *)pte_to_paddr(pgtbl[index]);                        \
> > +   
> > }                                                                  
> >      \
> > +   
> > else                                                               
> >      \
> > +   
> > {                                                                  
> >      \
> > +        /* Allocate new L{0-3} page table
> > */                                \
> > +        if ( mmu_desc->pgtbl_count == PGTBL_INITIAL_COUNT
> > )                 \
> > +       
> > {                                                                  
> > \
> > +            early_printk("(XEN) No initial table
> > available\n");             \
> > +            /* panic(), BUG() or ASSERT() aren't ready now.
> > */              \
> > +           
> > die();                                                          \
> > +       
> > }                                                                  
> > \
> > +        mmu_desc-
> > >pgtbl_count++;                                            \
> > +        pgtbl[index] = paddr_to_pte((unsigned long)mmu_desc-
> > >next_pgtbl,    \
> > +                                   
> > PTE_VALID);                             \
> > +        pgtbl = mmu_desc-
> > >next_pgtbl;                                       \
> > +        mmu_desc->next_pgtbl +=
> > PGTBL_ENTRY_AMOUNT;                         \
> > +    }
> > +
> > +static void __init setup_initial_mapping(struct mmu_desc
> > *mmu_desc,
> > +                                         unsigned long map_start,
> > +                                         unsigned long map_end,
> > +                                         unsigned long pa_start,
> > +                                         unsigned long
> > permissions)
> 
> Wouldn't the last one more sensibly be "unsigned int"?
It would. Thanks, I'll update the code.
> 
> > +{
> > +    unsigned int index;
> > +    pte_t *pgtbl;
> > +    unsigned long page_addr;
> > +    pte_t pte_to_be_written;
> > +    unsigned long paddr;
> > +    unsigned long tmp_permissions;
> > +
> > +    if ( (unsigned long)_start % XEN_PT_LEVEL_SIZE(0) )
> > +    {
> > +        early_printk("(XEN) Xen should be loaded at 4k
> > boundary\n");
> > +        die();
> > +    }
> > +
> > +    if ( map_start & ~XEN_PT_LEVEL_MAP_MASK(0) ||
> > +         pa_start & ~XEN_PT_LEVEL_MAP_MASK(0) )
> > +    {
> > +        early_printk("(XEN) map and pa start addresses should be
> > aligned\n");
> > +        /* panic(), BUG() or ASSERT() aren't ready now. */
> > +        die();
> > +    }
> > +
> > +    page_addr = map_start;
> > +    while ( page_addr < map_end )
> > +    {
> > +        pgtbl = mmu_desc->pgtbl_base;
> > +
> > +        switch (mmu_desc->num_levels)
> 
> Nit: Style (missing blanks).
Thanks. Ill update.
> 
> > +        {
> > +            case 4: /* Level 3 */
> 
> Nit: Indentation of case labels matches that of the opening brace in
> our
> style.
Sure. Missed that.
> 
> > +                HANDLE_PGTBL(3);
> > +            case 3: /* Level 2 */
> > +                HANDLE_PGTBL(2);
> > +            case 2: /* Level 1 */
> > +                HANDLE_PGTBL(1);
> > +            case 1: /* Level 0 */
> > +                index = pt_index(0, page_addr);
> > +                paddr = (page_addr - map_start) + pa_start;
> > +
> > +                tmp_permissions = permissions;
> > +
> > +                if ( is_kernel_text(LINK_TO_LOAD(page_addr)) ||
> > +                     is_kernel_inittext(LINK_TO_LOAD(page_addr)) )
> > +                    tmp_permissions =
> > +                        PTE_EXECUTABLE | PTE_READABLE | PTE_VALID;
> > +
> > +                if ( is_kernel_rodata(LINK_TO_LOAD(page_addr)) )
> > +                    tmp_permissions = PTE_READABLE | PTE_VALID;
> > +
> > +                pte_to_be_written = paddr_to_pte(paddr,
> > tmp_permissions);
> > +
> > +                if ( !pte_is_valid(pgtbl[index]) )
> > +                    pgtbl[index] = pte_to_be_written;
> > +                else
> > +                {
> > +                    /*
> > +                    * get an adresses of current pte and that one
> > to
> > +                    * be written without permission flags
> > +                    */
> > +                    unsigned long curr_pte =
> > +                        pgtbl[index].pte & ~((1 << PTE_PPN_SHIFT)
> > - 1);
> > +
> > +                    pte_to_be_written.pte &= ~((1 <<
> > PTE_PPN_SHIFT) - 1);
> 
> If you mean to only compare addresses, why don't you use
> pte_to_paddr()?
Yes, it will be better to use pte_to_paddr().
> Question though is whether it's correct to ignore permission
> differenes.
> I'd expect you only want to mask off PTE_ACCESSED and PTE_DIRTY.
Initially I would like to do rough check but probably you are right and
it will be better to mask off only PTE_ACCESSED and PTE_DIRTY.

> 
> > +                    if ( curr_pte != pte_to_be_written.pte )
> > +                    {
> > +                        early_printk("PTE that is intended to
> > write isn't the"
> > +                                    "same that the once are
> > overwriting\n");
> 
> Nit: One-off indentation. As to the message text - I take it that's
> temporary only anyway, and hence there's little point thinking about
> improving it?
Probably it would be more clear:
"PTE overridden has occurred"
> 
> > +                        /* panic(), <asm/bug.h> aren't ready now.
> > */
> > +                        die();
> > +                    }
> > +                }
> > +        }
> > +
> > +        /* Point to next page */
> > +        page_addr += XEN_PT_LEVEL_SIZE(0);
> 
> Seeing this as well as the loop heading - maybe more suitable as a
> for(;;) loop?
I am not sure that I understand the benefits of changing "while (
page_addr < map_end )" to "for(;;)".
Could you please explain me what the benefits are?
> 
> > +    }
> > +}
> 
> Since HANDLE_PGTBL() is strictly for use above only, I think you'd
> better
> #undef it here.
> 
> > +static void __init calc_pgtbl_lvls_num(struct  mmu_desc *mmu_desc)
> > +{
> > +    unsigned long satp_mode = RV_STAGE1_MODE;
> > +
> > +    /* Number of page table levels */
> > +    switch (satp_mode)
> > +    {
> > +        case SATP_MODE_SV32:
> > +            mmu_desc->num_levels = 2;
> > +            break;
> > +        case SATP_MODE_SV39:
> > +            mmu_desc->num_levels = 3;
> > +            break;
> > +        case SATP_MODE_SV48:
> > +            mmu_desc->num_levels = 4;
> > +            break;
> > +        default:
> > +            early_printk("(XEN) Unsupported SATP_MODE\n");
> > +            die();
> > +    }
> > +}
> > +
> > +static bool __init check_pgtbl_mode_support(struct mmu_desc
> > *mmu_desc,
> > +                                            unsigned long
> > load_start,
> > +                                            unsigned long
> > satp_mode)
> > +{
> > +    bool is_mode_supported = false;
> > +    unsigned int index;
> > +    unsigned int page_table_level = (mmu_desc->num_levels - 1);
> > +    unsigned level_map_mask =
> > XEN_PT_LEVEL_MAP_MASK(page_table_level);
> > +
> > +    unsigned long aligned_load_start = load_start &
> > level_map_mask;
> > +    unsigned long aligned_page_size =
> > XEN_PT_LEVEL_SIZE(page_table_level);
> > +    unsigned long xen_size = (unsigned long)(_end - start);
> > +
> > +    if ( (load_start + xen_size) > (aligned_load_start +
> > aligned_page_size) )
> > +    {
> > +        early_printk("please place Xen to be in range of PAGE_SIZE
> > "
> > +                     "where PAGE_SIZE is XEN_PT_LEVEL_SIZE( {L3 |
> > L2 | L1} ) "
> > +                     "depending on expected SATP_MODE \n"
> > +                     "XEN_PT_LEVEL_SIZE is defined in
> > <asm/page.h>\n");
> > +        die();
> > +    }
> > +
> > +    index = pt_index(page_table_level, aligned_load_start);
> > +    stage1_pgtbl_root[index] = paddr_to_pte(aligned_load_start,
> > +                                            PTE_LEAF_DEFAULT |
> > PTE_EXECUTABLE);
> > +
> > +    asm volatile("sfence.vma");
> 
> Nit (here and several times again below): Style (missing three
> blanks, as
> "asm" is a keyword).
Thanks. I'll add them
> 
> > +    csr_write(CSR_SATP,
> > +              ((unsigned long)stage1_pgtbl_root >> PAGE_SHIFT) |
> > +              satp_mode << SATP_MODE_SHIFT);
> > +
> > +    if ( (csr_read(CSR_SATP) >> SATP_MODE_SHIFT) == satp_mode )
> > +        is_mode_supported = true;
> > +
> > +    /* Clean MMU root page table and disable MMU */
> > +    stage1_pgtbl_root[index] = paddr_to_pte(0x0, 0x0);
> > +
> > +    csr_write(CSR_SATP, 0);
> > +    asm volatile("sfence.vma");
> 
> I guess what you do in this function could do with some more
> comments.
> Looks like you're briefly enabling the MMU to check that what you
> wrote
> to SATP you can also read back. (Isn't there a register reporting
> whether the feature is available?)
I supposed that it has to be but I couldn't find a register in docs.

> 
> If so, I think you cannot clear the stage1_pgtbl_root[] slot before
> you've disabled the MMU again.
> 
> (As a minor aspect, I'd like to encourage you to write plain zero
> just
> as 0, not as 0x0, unless used in contexts where other hex numbers
> nearby
> make this desirable.)
You are right it should be cleared after csr_write(CSR_SATP, 0)
> 
> > +    return is_mode_supported;
> > +}
> > +
> > +/*
> > + * setup_initial_pagetables:
> > + *
> > + * Build the page tables for Xen that map the following:
> > + *  1. Calculate page table's level numbers.
> > + *  2. Init mmu description structure.
> > + *  3. Check that linker addresses range doesn't overlap
> > + *     with load addresses range
> > + *  4. Map all linker addresses and load addresses ( it shouldn't
> > + *     be 1:1 mapped and will be 1:1 mapped only in case if
> > + *     linker address is equal to load address ) with
> > + *     RW permissions by default.
> > + *  5. Setup proper PTE permissions for each section.
> > + */
> > +void __init setup_initial_pagetables(void)
> > +{
> > +    struct mmu_desc mmu_desc = { 0, 0, NULL, 0 };
> 
> Just {} ought to do as initializer, but if you want to spell things
> out,
> then please use 0 / NULL consistently for integral / pointer type
> fields.
Thanks.

> 
> > +    /*
> > +     * Access to _{stard, end } is always PC-relative
> 
> I guess you mean _start. For just a leading underscore I also don't
> think
> using this braced form is useful.
OK then I'll update the comment w/o usage of braced form.
> 
> > +     * thereby when access them we will get load adresses
> > +     * of start and end of Xen
> > +     * To get linker addresses LOAD_TO_LINK() is required
> > +     * to use
> > +     */
> > +    unsigned long load_start    = (unsigned long)_start;
> > +    unsigned long load_end      = (unsigned long)_end;
> > +    unsigned long linker_start  = LOAD_TO_LINK(load_start);
> > +    unsigned long linker_end    = LOAD_TO_LINK(load_end);
> > +
> > +    if ( (linker_start != load_start) &&
> > +         (linker_start <= load_end) && (load_start <= linker_end)
> > ) {
> 
> Nit: Style (brace placement).
Thanks,

> 
> > +        early_printk("(XEN) linker and load address ranges
> > overlap\n");
> > +        die();
> > +    }
> > +
> > +    calc_pgtbl_lvls_num(&mmu_desc);
> > +
> > +    if ( !check_pgtbl_mode_support(&mmu_desc, load_start,
> > RV_STAGE1_MODE) )
> > +    {
> > +        early_printk("requested MMU mode isn't supported by CPU\n"
> > +                     "Please choose different in
> > <asm/config.h>\n");
> > +        die();
> > +    }
> > +
> > +    mmu_desc.pgtbl_base = stage1_pgtbl_root;
> > +    mmu_desc.next_pgtbl = stage1_pgtbl_nonroot;
> > +
> > +    setup_initial_mapping(&mmu_desc,
> > +                          linker_start,
> > +                          linker_end,
> > +                          load_start,
> > +                          PTE_LEAF_DEFAULT);
> > +}
> > +
> > +void __init noinline enable_mmu()
> > +{
> > +    /*
> > +     * Calculate a linker time address of the mmu_is_enabled
> > +     * label and update CSR_STVEC with it.
> > +     * MMU is configured in a way where linker addresses are
> > mapped
> > +     * on load addresses so in a case when linker addresses are
> > not equal
> > +     * to load addresses, after MMU is enabled, it will cause
> > +     * an exception and jump to linker time addresses.
> > +     * Otherwise if load addresses are equal to linker addresses
> > the code
> > +     * after mmu_is_enabled label will be executed without
> > exception.
> > +     */
> > +    csr_write(CSR_STVEC, LOAD_TO_LINK((unsigned
> > long)&&mmu_is_enabled));
> > +
> > +    /* Ensure page table writes precede loading the SATP */
> > +    asm volatile("sfence.vma");
> > +
> > +    /* Enable the MMU and load the new pagetable for Xen */
> > +    csr_write(CSR_SATP,
> > +              ((unsigned long)stage1_pgtbl_root >> PAGE_SHIFT) |
> 
> Please try to avoid open-coding of existing constructs: Here you mean
> either PFN_DOWN() or paddr_to_pfn() (you see, we have even two). (As
> I
> notice I did overlook at least similar earlier instance.)
Sure. Thanks.
> 
> > +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > +
> > +    asm volatile(".align 2");
> > +      mmu_is_enabled:
> 
> How in the world is one to spot this label? Yes, it shouldn't be
> entirely
> unindented. But it also should be indented less than the surrounding
> code
> (with the exception of switch() statement case labels, where a non-
> case
> label might be intended the same as a case ones). Our rule of thumb
> is to
> indent such labels by a single space.
Thanks. It looks like I misunderstood previous comment about label
indentation.
> 
> > +    /*
> > +     * Stack should be re-inited as:
> > +     * 1. Right now an address of the stack is relative to load
> > time
> > +     *    addresses what will cause an issue in case of load start
> > address
> > +     *    isn't equal to linker start address.
> > +     * 2. Addresses in stack are all load time relative which can
> > be an
> > +     *    issue in case when load start address isn't equal to
> > linker
> > +     *    start address.
> > +     */
> > +    asm volatile ("mv sp, %0" : : "r"((unsigned
> > long)cpu0_boot_stack + STACK_SIZE));
> 
> Nit: Style (overly long line).
> 
> What's worse - I don't think it is permitted to alter sp in the
> middle of
> a function. The compiler may maintain local variables on the stack
> which
> don't correspond to any programmer specified ones, including pointer
> ones
> which point into the stack frame. This is specifically why both x86
> and
> Arm have switch_stack_and_jump() macros.
but the macros (from ARM) looks equal to the code mentioned above:
#define switch_stack_and_jump(stack, fn) do {                         
\
    asm volatile ("mov sp,%0; b " STR(fn) : : "r" (stack), "X" (fn) :
"memory" ); \
    unreachable();                                                    
\
} while ( false )

Here is part of disassembled enable_mmu():

ffffffffc004aedc:       18079073                csrw    satp,a5
ffffffffc004aee0:       00016797                auipc   a5,0x16
ffffffffc004aee4:       12078793                addi    a5,a5,288
ffffffffc004aee8:       813e                    mv      sp,a5
ffffffffc004af00:       0f4000ef                jal    
ra,ffffffffc004aff4 <cont_after_mmu_is_enabled>
...


> 
> > +    /*
> > +     * We can't return to the caller because the stack was reseted
> > +     * and it may have stash some variable on the stack.
> > +     * Jump to a brand new function as the stack was reseted
> > +    */
> 
> Nit: Style (indentation).
Thanks.
> 
> > +    cont_after_mmu_is_enabled();
> > +}
> > +
> > diff --git a/xen/arch/riscv/riscv64/head.S
> > b/xen/arch/riscv/riscv64/head.S
> > index 8887f0cbd4..b3309d902c 100644
> > --- a/xen/arch/riscv/riscv64/head.S
> > +++ b/xen/arch/riscv/riscv64/head.S
> > @@ -1,4 +1,5 @@
> >  #include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> >  #include <asm/riscv_encoding.h>
> >  
> >          .section .text.header, "ax", %progbits
> > @@ -32,3 +33,4 @@ ENTRY(start)
> >          add     sp, sp, t0
> >  
> >          tail    start_xen
> > +
> 
> ???
Shouldn't it be the one empty line at the end of a file?

> 
> > --- a/xen/arch/riscv/xen.lds.S
> > +++ b/xen/arch/riscv/xen.lds.S
> > @@ -136,6 +136,7 @@ SECTIONS
> >      . = ALIGN(POINTER_ALIGN);
> >      __init_end = .;
> >  
> > +    . = ALIGN(PAGE_SIZE);
> >      .bss : {                     /* BSS */
> >          __bss_start = .;
> >          *(.bss.stack_aligned)
> 
> Why do you need this? You properly use __aligned(PAGE_SIZE) for the
> page tables you define, and PAGE_SIZE wouldn't be enough here anyway
> if STACK_SIZE > PAGE_SIZE (as .bss.stack_aligned comes first). The
> only time you'd need such an ALIGN() is if the following label
> (__bss_start in this case) needed to be aligned at a certain
> boundary. (I'm a little puzzled though that __bss_start isn't
> [immediately] preceded by ". = ALIGN(POINTER_ALIGN);" - didn't .bss
> clearing rely on such alignment?)
ALIGN(PAGE_SIZE)  isn't needed anymore.
I used it to have 4k aligned physical address for PTE when I mapped
each section separately ( it was so in the previous verstion of MMU
patch series )

Regarding ". = ALIGN(POINTER_ALIGN);" I would say that it is enough to
have aligned __bss_end ( what was done ) to be sure that we can clear
__SIZEOF_POINTER__ bytes each iteration of .bss clearing loop and don't
worry that size of .bss section may not be divisible by
__SIZEOF_POINTER__.

~ Oleksii

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.