[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 01/14] x86: add socket_to_cpumask



>>> On 08.05.15 at 10:56, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -64,6 +64,9 @@ unsigned int __read_mostly boot_cpu_physical_apicid = 
> BAD_APICID;
>  static unsigned int __devinitdata num_processors;
>  static unsigned int __initdata disabled_cpus;
>  
> +/* Total detected cpus (may exceed NR_CPUS) */
> +unsigned int total_cpus;
> +
>  /* Bitmask of physically existing CPUs */
>  physid_mask_t phys_cpu_present_map;
>  
> @@ -112,6 +115,8 @@ static int __devinit MP_processor_info_x(struct 
> mpc_config_processor *m,
>  {
>       int ver, apicid, cpu = 0;
>       
> +     total_cpus++;
> +
>       if (!(m->mpc_cpuflag & CPU_ENABLED)) {
>               if (!hotplug)
>                       ++disabled_cpus;

Is there a reason you can't use disabled_cpus and avoid adding yet
another variable?

> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -59,6 +59,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> +unsigned int nr_sockets __read_mostly;
> +cpumask_var_t *socket_to_cpumask __read_mostly;

I'd really like to see the "to" dropped from the name. It has been
confusing me not for the first time. I'd also prefer the section
annotations to be at their mandated place, between type and
variable name.

> @@ -239,11 +242,14 @@ static void link_thread_siblings(int cpu1, int cpu2)
>  
>  static void set_cpu_sibling_map(int cpu)
>  {
> -    int i;
> +    int i, socket = cpu_to_socket(cpu);

unsigned int

>      struct cpuinfo_x86 *c = cpu_data;
>  
>      cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
>  
> +    if ( socket < nr_sockets )
> +        cpumask_set_cpu(cpu, socket_to_cpumask[socket]);
> +
>      if ( c[cpu].x86_num_siblings > 1 )
>      {
>          for_each_cpu ( i, &cpu_sibling_setup_map )
> @@ -301,6 +307,7 @@ static void set_cpu_sibling_map(int cpu)
>              }
>          }
>      }
> +
>  }

???

> @@ -704,6 +711,8 @@ static struct notifier_block cpu_smpboot_nfb = {
>  
>  void __init smp_prepare_cpus(unsigned int max_cpus)
>  {
> +    int socket;

unsigned int

> @@ -717,6 +726,15 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  
>      stack_base[0] = stack_start;
>  
> +    nr_sockets = DIV_ROUND_UP(total_cpus, boot_cpu_data.x86_max_cores *
> +                                          boot_cpu_data.x86_num_siblings);
> +    socket_to_cpumask = xzalloc_array(cpumask_var_t, nr_sockets);
> +    if ( !socket_to_cpumask )
> +        panic("No memory for socket CPU siblings map");
> +    for ( socket = 0; socket < nr_sockets; socket++ )
> +        if ( !zalloc_cpumask_var(socket_to_cpumask + socket) )
> +            panic("No memory for socket CPU siblings cpumask");

You might be allocating quite a bit too much memory now that you
overestimate nr_sockets. Hence at least this second part of the
change here would better be switched to an on demand allocation
model.

> @@ -779,9 +797,12 @@ void __init smp_prepare_boot_cpu(void)
>  static void
>  remove_siblinginfo(int cpu)
>  {
> -    int sibling;
> +    int sibling, socket = cpu_to_socket(cpu);

unsigned int

> --- a/xen/include/asm-x86/smp.h
> +++ b/xen/include/asm-x86/smp.h
> @@ -58,6 +58,22 @@ int hard_smp_processor_id(void);
>  
>  void __stop_this_cpu(void);
>  
> +/* Total number of cpus in this system (may exceed NR_CPUS) */
> +extern unsigned int total_cpus;
> +
> +/*
> + * This value is calculated by total_cpus/cpus_per_socket with the assumption
> + * that APIC IDs from MP table are continuous. It's possible that this value
> + * is less than the real socket number in the system if the APIC IDs from MP
> + * table are too sparse. Also the value is considered not to change from the
> + * initial startup. Violation of any of these assumptions may result in 
> errors
> + * and requires retrofitting all the relevant places.
> + */

This all reads pretty frightening. How about using a better estimate of
core and thread count (i.e. ones matching actually observed values
instead of the nearest larger powers of two) in the nr_sockets
calculation? Overestimating nr_sockets is surely better than using too
small a value, as the alternative of remembering to always bounds
check socket values before use (not only in your series, but also in
future changes) is going to be pretty fragile.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.