[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/3] xen/ppc: Set up a basic C environment



On 7/17/23 10:38 AM, Jan Beulich wrote:
> On 06.07.2023 21:04, Shawn Anastasio wrote:
>> --- a/xen/arch/ppc/include/asm/config.h
>> +++ b/xen/arch/ppc/include/asm/config.h
>> @@ -43,7 +43,7 @@
>>  
>>  #define SMP_CACHE_BYTES (1 << 6)
>>  
>> -#define STACK_ORDER 2
>> +#define STACK_ORDER 0
>>  #define STACK_SIZE  (PAGE_SIZE << STACK_ORDER)
> 
> In which way is this related to the change at hand? Aren't you going to
> need to undo this rather sooner than later?

I noticed the stack order being too large when I moved the stack
declaration to .c per Andrew's recommendation for v2. Since we're using
64k pages, I don't see why the stack would need to be this big. A quick
look at ARM shows they have a stack order of 3, which would yield a
stack size of just 32k.

>> --- a/xen/arch/ppc/ppc64/head.S
>> +++ b/xen/arch/ppc/ppc64/head.S
>> @@ -1,30 +1,30 @@
>>  /* SPDX-License-Identifier: GPL-2.0-or-later */
>>  
>> +#include <asm/asm-defns.h>
>> +
>>      .section .text.header, "ax", %progbits
>>  
>>  ENTRY(start)
>>      /*
>> -     * Depending on how we were booted, the CPU could be running in either
>> -     * Little Endian or Big Endian mode. The following trampoline from Linux
>> -     * cleverly uses an instruction that encodes to a NOP if the CPU's
>> -     * endianness matches the assumption of the assembler (LE, in our case)
>> -     * or a branch to code that performs the endian switch in the other 
>> case.
>> +     * NOTE: argument registers (r3-r9) must be preserved until the C 
>> entrypoint
>>       */
>> -    tdi 0, 0, 0x48    /* Reverse endian of b . + 8          */
>> -    b . + 44          /* Skip trampoline if endian is good  */
>> -    .long 0xa600607d  /* mfmsr r11                          */
>> -    .long 0x01006b69  /* xori r11,r11,1                     */
>> -    .long 0x00004039  /* li r10,0                           */
>> -    .long 0x6401417d  /* mtmsrd r10,1                       */
>> -    .long 0x05009f42  /* bcl 20,31,$+4                      */
>> -    .long 0xa602487d  /* mflr r10                           */
>> -    .long 0x14004a39  /* addi r10,r10,20                    */
>> -    .long 0xa6035a7d  /* mtsrr0 r10                         */
>> -    .long 0xa6037b7d  /* mtsrr1 r11                         */
>> -    .long 0x2400004c  /* rfid                               */
>> -
>> -    /* Now that the endianness is confirmed, continue */
>> -1:  b 1b
>> +    FIXUP_ENDIAN
>> +
>> +    /* set up the TOC pointer */
>> +    LOAD_IMM32(%r2, .TOC.)
>> +
>> +    /* set up the initial stack */
>> +    LOAD_IMM32(%r1, cpu0_boot_stack)
> 
> Wouldn't this (and perhaps also .TOC.) better be calculated in a
> PC-relative manner? Or is the plan to have Xen linked to an address
> below 4Gb?

As mentioned previously, I am planning to enable the PIC code model in
my next series in order to accommodate booting on the PowerNV firmware
type which has a different load address. That patch will change the
initial TOC load to a simulated PC-relative one (pre-POWER10 doesn't
have proper PC-relative loads/stores) and the rest to TOC-relative.

>> +    li %r11, 0
>> +    stdu %r11, -STACK_FRAME_OVERHEAD(%r1)
>> +
>> +    /* call the C entrypoint */
>> +    bl start_xen
>> +
>> +    /* should never return */
>> +    trap
>>  
>>      .size start, . - start
>>      .type start, %function
>> +
>> +    .section .init.data, "aw", @progbits
> 
> What's this good for when no data follows?

Good catch. With the stack no longer declared in head.S this is
unnecessary. Will remove in v3.

>> --- /dev/null
>> +++ b/xen/arch/ppc/setup.c
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +#include <xen/init.h>
>> +
>> +/* Xen stack for bringing up the first CPU. */
>> +unsigned char __initdata cpu0_boot_stack[STACK_SIZE] __aligned(STACK_SIZE);
> 
> This yields the entire array as zero-initialized. At which point I
> don't see a need for the store in head.S.

Okay, fair enough. Given that the array is zero-initialized the stdu
could be replaced with an `addi %r1, %r1, -STACK_FRAME_OVERHEAD`, and
the load of zero to %r11 could be deferred to the second patch in this
series where it's used in the .bss clearing loop.

That said I don't really see the harm with keeping the standard
idiomatic `stdu` for the initial stack frame setup. Other than
performance, which isn't a concern here in early setup code.

>> +/* Macro to adjust thread priority for hardware multithreading */
>> +#define HMT_very_low()  asm volatile (" or %r31, %r31, %r31 ")
> 
> Nit: Style. Wants to be
> 
> #define HMT_very_low()  asm volatile ( "or %r31, %r31, %r31" )

Will fix in v3.

> Jan

Thanks,
Shawn



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.