[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages
On Thu, 2023-03-23 at 11:57 +0000, Julien Grall wrote: > On 23/03/2023 11:18, Oleksii wrote: > > Hi Julien, > > Hi Oleksii, > > > On Wed, 2023-03-22 at 14:21 +0000, Julien Grall wrote: > > ... > > > > > > > > > + unsigned long page_addr; > > > > > > + > > > > > > + // /* align start addresses */ > > > > > > + // map_start &= XEN_PT_LEVEL_MAP_MASK(0); > > > > > > + // pa_start &= XEN_PT_LEVEL_MAP_MASK(0); > > > > > > > > > > They should be switched to ASSERT() or BUG_ON(). > > > > Sure. Thanks. I'll update. > > > > > > > > > > > + > > > > > > + page_addr = map_start; > > > > > > + while ( page_addr < map_end ) > > > > > > > > > > This loop is continue to assume that only the mapping can > > > > > first > > > > > in > > > > > 2MB > > > > > section (or less if the start is not 2MB aligned). > > > > > > > > > > I am OK if you want to assume it, but I think this should be > > > > > documented > > > > > (with words and ASSERT()/BUG_ON()) to avoid any mistake. > > > > I add a check in setup_initial_pagetables: > > > > BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != > > > > 0); > > > > Probably this is not a correct place and should be moved to > > > > setup_initial_mapping() instead of setup_initial_pagetables() > > > > > > Yes it should be moved in setup_initial_mapping(). > > Then I'll moved it to setup_initial_mapping() > > > > > > > > > > > > > > + { > > > > > > + unsigned long index2 = pt_index(2, page_addr); > > > > > > + unsigned long index1 = pt_index(1, page_addr); > > > > > > + unsigned long index0 = pt_index(0, page_addr); > > > > > > + > > > > > > + /* Setup level2 table */ > > > > > > + second[index2] = paddr_to_pte((unsigned > > > > > > long)first); > > > > > > + second[index2].pte |= PTE_TABLE; > > > > > > + > > > > > > + /* Setup level1 table */ > > > > > > + first[index1] = paddr_to_pte((unsigned > > > > > > long)zeroeth); > > > > > > + first[index1].pte |= PTE_TABLE; > > > > > > + > > > > > > + > > > > > > > > > > NIT: Spurious line? > > > > Yeah, should be removed. Thanks. > > > > > > > > > > > + /* Setup level0 table */ > > > > > > + if ( !pte_is_valid(&zeroeth[index0]) ) > > > > > > > > > > On the previous version, you said it should be checked for > > > > > each > > > > > level. > > > > I had a terrible internet connection, and my message wasn't > > > > sent. > > > > > > No worries. > > > > > > > > > > > I decided not to check that l2 and l1 are used only for > > > > referring > > > > to > > > > the next page table but not leaf PTE. So it is safe to > > > > overwrite it > > > > each time (the addresses of page tables are the same all the > > > > time) > > > > > > You are letting the caller to decide which page-table to use for > > > each > > > level. So you are at the mercy that caller will do the right > > > thing. > > > > > > IHMO, this is a pretty bad idea because debugging page-tables > > > error > > > are > > > difficult. So it is better to have safety in place. This is not > > > worth... > > > > > > and > > > > probably it will be better from optimization point of view to > > > > ignore if > > > > clauses. > > > > > > ... the optimization in particular when this is at boot time. > > I didn't think about that caller will do always the right thing so > > I will add the check. > > > > > > > > > > > And it is needed in case of L0 because it is possible that some > > > > addressed were marked with specific flag ( execution, read, > > > > write ) > > > > and > > > > so not to overwrite the flags set before the check is needed. > > > In which case you should really report an error because the > > > caller > > > may > > > have decide to set execution flag and you don't honor. So when > > > the > > > code > > > is executed, you will receive a fault and this may be hard to > > > find > > > out > > > what happen. > > > > Right now, it is expected situation that the caller will try to > > change > > execution flag during the setup of initial page tables. > > > It is possible in the currently implemented logic of the setup of > > initial page tables. > > This sounds like a bug in your caller implementation. You should not > try > to workaround this in your code updating the page-tables. > > > Let me explain what I mean. > > > > The first step of setup_initial_pagetables() is to map .text, > > .init, > > .rodata with necessary flags RX, RX, R. > > Remaining sections will have RW flags, and to map them, > > setup_initial_mapping() is called for the whole range of > > [linker_start, > > linker_end] not to map them one by one thereby during this step > > setup_initial_mapping() will try to remap addresses ranges which > > overlap with .text, .init, .rodata with RW flags but it should > > leave > > with the previously set flags. > Why do you need to call setup_init_mapping() with the whole range? In > fact the only reason I can think this is useful is when you when to > create a 1:1 mapping when the linker and load address is different. It is needed to not map each section separately one by one as most of the sections have the same PTE_FLAGS (Read, Write, eXectuable, etc ) So it was decided to map the following sections separately as they have 'unique' flags: - .text -> RX - .rodata -> R - .init.text -> RX All other sections are RW and could be covered by one call of setup_init_mapping() for the whole range: - .data.ro_after_init - .data.read_mostly - .data - .init.data - .bss So some ranges ( .text, .rodata, .init.text ) from the whole range will be skipped as already mapped and the rest sections will be mapped during one call of setup_init_mapping(). > But... > > > > > > > + > > > > > > + /* Get the addresses where the page tables were loaded > > > > > > */ > > > > > > + second = (pte_t *)(&xen_second_pagetable); > > > > > > + first = (pte_t *)(&xen_first_pagetable); > > > > > > + zeroeth = (pte_t *)(&xen_zeroeth_pagetable); > > > > > > + > > > > > > + setup_initial_mapping(second, first, zeroeth, > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&_stext), > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&_etext), > > > > > > + (unsigned long)&_stext, > > > > > > + PTE_LEAF_DEFAULT); > > > > > > + > > > > > > + setup_initial_mapping(second, first, zeroeth, > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&__init_begin), > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&__init_end), > > > > > > + (unsigned long)&__init_begin, > > > > > > + PTE_LEAF_DEFAULT | > > > > > > PTE_WRITABLE); > > > > > > + > > > > > > + setup_initial_mapping(second, first, zeroeth, > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&_srodata), > > > > > > + LOAD_TO_LINK((unsigned > > > > > > long)&_erodata), > > > > > > + (unsigned long)(&_srodata), > > > > > > + PTE_VALID | PTE_READABLE); > > > > > > + > > > > > > + setup_initial_mapping(second, first, zeroeth, > > > > > > + linker_addr_start, > > > > > > + linker_addr_end, > > > > > > + load_addr_start, > > > > > > + PTE_LEAF_DEFAULT | > > > > > > PTE_READABLE); > > > > > > ... this is not cover above. AFAIU, this is the one for the 1:1 > > > mapping. > > But there is no 1:1 mapping. > > I thought that 1:1 mapping is when the physical address is equal to > > 0x10020 then after MMU is enabled the virtual address will be the > > same > > 0x10020 and mapped to 0x10020. > > > > Probably I missed something but this code maps all linker addresses > > to correspondent load addresses and it will be 1:1 only in case > > when > > load address is equal to linker address. > > ... here you say this is not the purpose. > > Also, if you don't intend to deal with load != link address yet, then > the following BUG_ON() needs to be updated: > > + if (load_addr_start != linker_addr_start) > + BUG_ON((linker_addr_end > load_addr_start && load_addr_end > > linker_addr_start)); I think that I'll cover it in the new patch series as I clarified all necessary information and I don't expect that it will be too hard to add from the start. > > > > > > > > > > > > > > As I said in v1, you need to use a different set of page- > > > > > table > > > > > here. > > > > If I understand you correctly I have to use a different set of > > > > page- > > > > table in case when it is possible that size of Xen will be > > > > larger > > > > then > > > > PAGE_SIZE. So I added to xen.lds.S a check to be sure that the > > > > size > > > > fits into PAGE_SIZE. > > > > > > This is not what I was referring to. I was pointing out that > > > second, > > > first, zeroeth are exactly the same for all the callers. You want > > > to > > > introduce a second set of zeroeth table. You will want to do the > > > same > > > for first but it is not always used. > > > > > > Otherwise, this is not going to work if Xen is loaded at a > > > different > > > address than the runtime. > > Ok. > > > > I understand what do you mean in general but still I am not sure > > that I > > understand a particular case when I am sure that the size of Xen is > > no > > bigger then 2MB and load address is aligned on 2Mb boundary. > > > > The size of one l0 page table is 512 (1 << 9 ) entries which covers > > 4K > > (1 << 12) * 512 = 2 Mb of memory so it should be fine. > > All the callers in my case are trying to map addresses from > > [linker_start, linker_end] which is not bigger then 2 MB and is > > aligned > > on 2 MB boundary. > > I interpreted that your last call was meant to be for the 1:1 mapping > when load != link address. It looks like I was wrong, so the same > page-table should be OK. Thanks. Now we are on the same page. But actually I like an idea to have more page tables and remove a limitation to be aligned on 2MB boundary from the start. I experimented with adding of additional table ( ( if it is necessary ) so it shouldn't be hard to add that code too. > > > > > > > > > That said, when I spoke with Andrew yesterday, he mentioned that > > > your > > > initial goal is to support the case where Xen is loaded at the > > > runtime > > > address. I understand this simplifies a lot the code and I told > > > him I > > > was OK with that. However, it would be good to document what are > > > your > > > goals in each series (this is not always clear what you skip on > > > purpose). > > > > > > > > > > > > Also, where do you guarantee that Xen will be loaded at a 2MB > > > > > aligned > > > > > address? (For a fact I know that UEFI is only guarantee 4KB > > > > > alignment). > > > > There is a check in setup_initial_pagetables: > > > > BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != > > > > 0); > > > > > > This is not very obvious the check is to confirm that Xen is > > > probably > > > aligned. I would suggest to add a comment. > > > > > > Also, you might want to use XEN_PT_LEVEL_SIZE(..) to make it more > > > obvious what sort of alignment you are trying to enforce. > > I will update add the comment and use XEN_PT_LEVEL_SIZE(...) > > instead. > > > > > > > > > + la sp, cpu0_boot_stack > > > > > > + li t0, STACK_SIZE > > > > > > + add sp, sp, t0 > > > > > > + > > > > > > + /* > > > > > > + * Re-init an address of exception handler as it > > > > > > was > > > > > > overwritten with > > > > > > + * the address of the .L_mmu_is_enabled label. > > > > > > + * Before jump to trap_init save return address of > > > > > > enable_mmu() to > > > > > > + * know where we should back after enable_mmu() > > > > > > will > > > > > > be > > > > > > finished. > > > > > > + */ > > > > > > + mv s0, ra > > > > > > + lla t0, trap_init > > > > > > + jalr ra, t0 > > > > > > + > > > > > > + /* > > > > > > + * Re-calculate the return address of enable_mmu() > > > > > > function for case > > > > > > + * when linker start address isn't equal to load > > > > > > start > > > > > > address > > > > > > + */ > > > > > > + add s0, s0, t1 > > > > > > + mv ra, s0 > > > > > > + > > > > > > + ret > > > > > > > > > > Missing ENDPROC? > > > > I haven't seen the usage of ENDPROC for RISC-V so it looks like > > > > it > > > > is > > > > not necessary. > > > > > > Ok. Would the objdump be able to report the function properly > > > then? I > > > know that on Arm, it was necessary report assembly function > > > properly. > > It is fine for RISC-V: > > > > Disassembly of section .text: > > > > 0000000000200000 <_start>: > > 200000: 10401073 csrw sie,zero > > 200004: 6299 lui t0,0x6 > > ... > > 20003c: 00000013 nop > > > > 0000000000200040 <enable_mmu>: > > 200040: 0003a297 auipc t0,0x3a > > ... > > 20009e: 941a add s0,s0,t1 > > 2000a0: 80a2 mv ra,s0 > > 2000a2: 8082 ret > > ... > > > > 00000000002000b0 <__bitmap_empty>: > > 2000b0: 1141 addi sp,sp,-16 > > 2000b2: e422 sd s0,8(sp) > > 2000b4: 0800 addi s0,sp,16 > > ... > > Perfect thanks! > ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |