|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 5/8] xen/riscv: introduce identity mapping
On Mon, 2023-06-12 at 15:48 +0200, Jan Beulich wrote:
> On 06.06.2023 21:55, Oleksii Kurochko wrote:
> > The way how switch to virtual address was implemented in the
> > commit e66003e7be ("xen/riscv: introduce setup_initial_pages")
> > wasn't safe enough so identity mapping was introduced and
> > used.
>
> I don't think this is sufficient as a description. You want to make
> clear what the "not safe enough" is, and you also want to go into
> more detail as to the solution chosen. I'm particularly puzzled that
> you map just two singular pages ...
I'll update a descrption.
I map two pages as it likely to be enough to switch from 1:1 mapping
world. My point is that we need 1:1 mapping only for few instructions
which are located in start() [ in .text.header section ]:
li t0, XEN_VIRT_START
la t1, start
sub t1, t1, t0
/* Calculate proper VA after jump from 1:1 mapping */
la t0, .L_primary_switched
sub t0, t0, t1
/* Jump from 1:1 mapping world */
jr t0
And it is needed to map 1:1 cpu0_boot_stack to not to fault when
executing epilog of enable_mmu() function as it accesses a value on the
stack:
ffffffffc00001b0: 6422 ld s0,8(sp)
ffffffffc00001b2: 0141 addi sp,sp,16
ffffffffc00001b4: 8082 ret
>
> > @@ -35,8 +40,10 @@ static unsigned long phys_offset;
> > *
> > * It might be needed one more page table in case when Xen load
> > address
> > * isn't 2 MB aligned.
> > + *
> > + * 3 additional page tables are needed for identity mapping.
> > */
> > -#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1)
> > +#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1 + 3)
>
> What is this 3 coming from? It feels like the value should (again)
> somehow depend on CONFIG_PAGING_LEVELS.
3 is too much in the current case. It should be 2 more.
The linker address is 0xFFFFFFFC00000000 thereby mapping the linker to
load addresses
we need 3 pages ( with the condition that the size of Xen won't be
larger than 2 MB ) and 1 page if the case when Xen load address isn't
2MV aligned.
We need 2 more page tables because Xen is loaded to address 0x80200000
by OpenSBI and the load address isn't equal to the linker address so we
need additional 2 pages as the L2 table we already allocated when
mapping the linker to load addresses.
And it looks like 2 is enough for Sv48, Sv57 as L4, L3 and L2
pagetables will be already allocated during mapping the linker to load
addresses.
And it only will be too much for Sv32 ( which has only L1, L0 in
general ) but I am not sure that anyone cares about Sv32.
>
> > @@ -108,16 +116,18 @@ static void __init
> > setup_initial_mapping(struct mmu_desc *mmu_desc,
> > {
> > unsigned long paddr = (page_addr - map_start) +
> > pa_start;
> > unsigned int permissions = PTE_LEAF_DEFAULT;
> > + unsigned long addr = (is_identity_mapping) ?
>
> Nit: No need for parentheses here.
Thanks.
>
> > + page_addr :
> > LINK_TO_LOAD(page_addr);
>
> As a remark, while we want binary operators at the end of lines when
> wrapping, we usually do things differently for the ternary operator:
> Either
>
> unsigned long addr = is_identity_mapping
> ? page_addr :
> LINK_TO_LOAD(page_addr);
>
> or
>
> unsigned long addr = is_identity_mapping
> ? page_addr
> : LINK_TO_LOAD(page_addr);
It looks better. Thanks.
>
> .
>
> > @@ -232,22 +242,27 @@ void __init setup_initial_pagetables(void)
> > linker_start,
> > linker_end,
> > load_start);
> > +
> > + if ( linker_start == load_start )
> > + return;
> > +
> > + setup_initial_mapping(&mmu_desc,
> > + load_start,
> > + load_start + PAGE_SIZE,
> > + load_start);
> > +
> > + setup_initial_mapping(&mmu_desc,
> > + (unsigned long)cpu0_boot_stack,
> > + (unsigned long)cpu0_boot_stack +
> > PAGE_SIZE,
>
> Shouldn't this be STACK_SIZE (and then also be prepared for
> STACK_SIZE > PAGE_SIZE)?
Yes, it will be better to use STACK_SIZE but for 1:1 mapping it will be
enough PAGE_SIZE as I mentioned above this only need for epilogue of
enable_mmu() function.
>
> > + (unsigned long)cpu0_boot_stack);
> > }
> >
> > -void __init noreturn noinline enable_mmu()
> > +/*
> > + * enable_mmu() can't be __init because __init section isn't part
> > of identity
> > + * mapping so it will cause an issue after MMU will be enabled.
> > + */
>
> As hinted at above already - perhaps the identity mapping wants to be
> larger, up to covering the entire Xen image? Since it's temporary
> only anyway, you could even consider using a large page (and RWX
> permission). You already require no overlap of link and load
> addresses,
> so at least small page mappings ought to be possible for the entire
> image.
It makes sense and started to thought about that after I applied the
patch for Dom0 running... I think we can map 1 GB page which will cover
the whole Xen image. Also in that case we haven't to allocate 2 more
page tables.
>
> > @@ -255,25 +270,41 @@ void __init noreturn noinline enable_mmu()
> > csr_write(CSR_SATP,
> > PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> > RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > +}
> > +
> > +void __init remove_identity_mapping(void)
> > +{
> > + int i, j;
>
> Nit: unsigned int please.
Thanks.
>
> > + pte_t *pgtbl;
> > + unsigned int index, xen_index;
>
> These would all probably better be declared in the narrowest possible
> scope.
Sure.
>
> > - asm volatile ( ".p2align 2" );
> > - mmu_is_enabled:
> > /*
> > - * Stack should be re-inited as:
> > - * 1. Right now an address of the stack is relative to load
> > time
> > - * addresses what will cause an issue in case of load start
> > address
> > - * isn't equal to linker start address.
> > - * 2. Addresses in stack are all load time relative which can
> > be an
> > - * issue in case when load start address isn't equal to
> > linker
> > - * start address.
> > - *
> > - * We can't return to the caller because the stack was reseted
> > - * and it may have stash some variable on the stack.
> > - * Jump to a brand new function as the stack was reseted
> > + * id_addrs should be in sync with id mapping in
> > + * setup_initial_pagetables()
>
> What is "id" meant to stand for here? Also if things need keeping in
> sync, then a similar comment should exist on the other side.
"id" here mean identity.
>
> > */
> > + unsigned long id_addrs[] = {
> > + LINK_TO_LOAD(_start),
> > + LINK_TO_LOAD(cpu0_boot_stack),
> > + };
> >
> > - switch_stack_and_jump((unsigned long)cpu0_boot_stack +
> > STACK_SIZE,
> > - cont_after_mmu_is_enabled);
> > + pgtbl = stage1_pgtbl_root;
> > +
> > + for ( j = 0; j < ARRAY_SIZE(id_addrs); j++ )
> > + {
> > + for ( pgtbl = stage1_pgtbl_root, i = CONFIG_PAGING_LEVELS
> > - 1; i >= 0; i-- )
> > + {
> > + index = pt_index(i, id_addrs[j]);
> > + xen_index = pt_index(i, XEN_VIRT_START);
> > +
> > + if ( index != xen_index )
> > + {
> > + pgtbl[index].pte = 0;
> > + break;
> > + }
>
> Up to here I understand index specifies a particular PTE within
> pgtbl[].
>
> > + pgtbl = &pgtbl[index];
>
> But then how can this be the continuation of the page table walk?
> Don't
> you need to read the address out of the PTE?
You are right. it should be:
pgtbl = (pte_t *)pte_to_paddr(pgtbl[index]);
>
> > --- a/xen/arch/riscv/riscv64/head.S
> > +++ b/xen/arch/riscv/riscv64/head.S
> > @@ -31,6 +31,36 @@ ENTRY(start)
> >
> > jal calc_phys_offset
> >
> > + jal setup_initial_pagetables
> > +
> > + jal enable_mmu
> > +
> > + /*
> > + * Calculate physical offset
> > + *
> > + * We can't re-use a value in phys_offset variable here as
> > + * phys_offset is located in .bss and this section isn't
> > + * 1:1 mapped and an access to it will cause MMU fault
> > + */
> > + li t0, XEN_VIRT_START
> > + la t1, start
> > + sub t1, t1, t0
>
> If I'm not mistaken this calculates start - XEN_VIRT_START, and ...
>
> > + /* Calculate proper VA after jump from 1:1 mapping */
> > + la t0, .L_primary_switched
> > + sub t0, t0, t1
>
> ... then this .L_primary_switched - (start - XEN_VIRT_START). Which
> can
> also be expressed as (.L_primary_switched - start) + XEN_VIRT_START,
> the first part of which gas ought to be able to resolve for you. But
> upon experimenting a little it looks like it can't. (I had thought of
> something along the lines of
>
> li t0, .L_primary_switched - start
> li t1, XEN_VIRT_START
> add t0, t0, t1
>
> or even
>
> li t1, XEN_VIRT_START
> add t0, t1, %pcrel_lo(.L_primary_switched - start)
>
> .)
Calculation of ".L_primary_switched - start" might be an issue for gcc.
I tried to do something similar and recieved or "Error: can't resolve
.L_primary_switched - start" or "Error: illegal operands `li
t0,.L_primary_switched-start'"
~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |