[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 4/7] xen/riscv: introduce functionality to work with CPU info
On 14.08.2024 16:45, oleksii.kurochko@xxxxxxxxx wrote: > On Tue, 2024-08-13 at 10:54 +0200, Jan Beulich wrote: >> On 09.08.2024 18:19, Oleksii Kurochko wrote: >>> --- a/xen/arch/riscv/include/asm/smp.h >>> +++ b/xen/arch/riscv/include/asm/smp.h >>> @@ -5,6 +5,8 @@ >>> #include <xen/cpumask.h> >>> #include <xen/percpu.h> >>> >>> +#define INVALID_HARTID UINT_MAX >>> + >>> DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask); >>> DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask); >>> >>> @@ -14,6 +16,14 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask); >>> */ >>> #define park_offline_cpus false >>> >>> +void smp_set_bootcpu_id(unsigned long boot_cpu_hartid); >>> + >>> +/* >>> + * Mapping between linux logical cpu index and hartid. >>> + */ >>> +extern unsigned long cpuid_to_hartid_map[NR_CPUS]; >>> +#define cpuid_to_hartid(cpu) cpuid_to_hartid_map[cpu] >> >> How wide can hart IDs be? Wider than 32 bits? If not, why unsigned >> long? > According to the spec: > ``` > The mhartid CSR is an MXLEN-bit read-only register containing the > integer ID of the hardware thread running the code > ``` > where MXLEN can bit 32 and 64. Hmm, okay. If the full MXLEN bits can be used, then using unsigned long is indeed the right thing here. >>> @@ -40,6 +41,19 @@ void __init noreturn start_xen(unsigned long >>> bootcpu_id, >>> { >>> remove_identity_mapping(); >>> >>> + /* >>> + * tp register contains an address of physical cpu >>> information. >>> + * So write physical CPU info of boot cpu to tp register >>> + * It will be used later by get_processor_id() ( look at >>> + * <asm/processor.h> ): >>> + * #define get_processor_id() (tp->processor_id) >>> + */ >>> + asm volatile ( "mv tp, %0" : : "r"((unsigned >>> long)&pcpu_info[0]) ); >> >> Perhaps be on the safe side and also add a memory barrier? > Do you mean compiler barrier? ( asm volatile ( "..." :: ... : "memory" > )? ) Yes. >> Perhaps be on the safe side and do this absolutely first in the >> function, >> or even in assembly (such that initializers of future variables >> declared >> at the top of the function end up safe, too)? > I am not sure that I am following your part related to put this code in > assembler ( instructions in assembly code still code be re-ordered what > can affect a time when tp will be set with correct value ) I'm afraid I don't understand this. Instructions can be re-ordered, sure, but later instructions are guaranteed to observe the effects on register state that earlier instructions caused. > and what do > you mean by "initializers of future variables declared at the top of > the function end up safe" With void start_xen() { int var = func(); asm volatile ( "mv tp, %0" :: ...); ... you end up with the requirement that func() must not use anything that depends on tp being set. In this simple example it may be easy to say "don't use an initializer and call the function afterwards". But that is going to be a risky game to play. Look at x86'es __start_xen(). An insertion into such a big set of declarations may not pay attention to tp not being set yet, when _all_ other C code may reasonably assume tp is set. >>> --- /dev/null >>> +++ b/xen/arch/riscv/smp.c >>> @@ -0,0 +1,4 @@ >>> +#include <xen/smp.h> >>> + >>> +/* tp points to one of these per cpu */ >>> +struct pcpu_info pcpu_info[NR_CPUS]; >> >> And they all need setting up statically? Is there a plan to make this >> dynamic (which could be recorded in a "fixme" in the comment)? > I didn't plan to make allocation of this array dynamic. I don't expect > that NR_CPUS will be big. What is this expectation of yours based on? Other architectures permit systems with hundreds or even thousands of CPUs; why would RISC-V be different there? > I can add "fixme" but I am not really > understand what will be advantages if pcpu_info[] will be allocated > dynamically. Where possible it's better to avoid static allocations, of which on some systems only a very small part may be used. Even if you put yourself on the position that many take - memory being cheap - you then still waste cache and TLB bandwidth. Furthermore as long as struct pcpu_info isn't big enough (and properly aligned) for two successive array entries to not share cache lines, you may end up playing cacheline ping-pong when a CPU writes to its own array slot. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |