[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 01/13] x86: add socket_cpumask



>>> On 08.07.15 at 04:43, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> On Tue, Jul 07, 2015 at 06:32:55PM -0400, Boris Ostrovsky wrote:
>> >@@ -245,6 +248,8 @@ static void set_cpu_sibling_map(int cpu)
>> >      cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
>> >+    cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
>> 
>> This patch crashes Xen on my 32-cpu Intel box here for cpu 16, which is the
>> first CPU on the second socket (i.e. on socket 1).
>> 
>> The reason appears to be that cpu_to_socket(16) is (correctly) 1 here, but
>> ...
>> 
>> >+
>> >      if ( c[cpu].x86_num_siblings > 1 )
>> >      {
>> >          for_each_cpu ( i, &cpu_sibling_setup_map )
>> >@@ -649,7 +654,13 @@ void cpu_exit_clear(unsigned int cpu)
>> >  static void cpu_smpboot_free(unsigned int cpu)
>> >  {
>> >-    unsigned int order;
>> >+    unsigned int order, socket = cpu_to_socket(cpu);
>> >+
>> >+    if ( cpumask_empty(socket_cpumask[socket]) )
>> >+    {
>> >+        free_cpumask_var(socket_cpumask[socket]);
>> >+        socket_cpumask[socket] = NULL;
>> >+    }
>> >      free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> >      free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> >@@ -694,6 +705,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
>> >      nodeid_t node = cpu_to_node(cpu);
>> >      struct desc_struct *gdt;
>> >      unsigned long stub_page;
>> >+    unsigned int socket = cpu_to_socket(cpu);
>> 
>> ... is zero here, meaning that socket_cpumask[1] is NULL. I suspect that
>> phys_proc_id is probably not set at this point but is by the time we get to
>> set_cpu_sibling_map(). I haven't looked any further yet. I might do this
>> tomorrow unless Chao does it before me.
> 
> Thanks for testing.

Boris' report first of all raises the question: Did you test this at all
on a multi-socket system? Considering you not having tested the
CPU removal case either, I'm starting to wonder how much testing
this series has seen overall...

> I think I have found the reason. For AP, phys_proc_id is set in:
> start_secondary()=>smp_callin()=>smp_store_cpu_info()=>identify_cpu()
> which is behind cpu_smpboot_alloc() called from CPU_PREPARE.
> 
> One way would move 'zalloc_cpumask_var(socket_cpumask + socket)' to
> set_cpu_sibling_map() to fix it if Jan agrees that, otherwise other
> solution needs to be found.

Looks sensible at a first glance, but in order to be able to do
proper error handling the allocation needs to remain in
cpu_smpboot_alloc(). I.e. you'd add a static variable, pre-
allocate a cpumask into it if it's currently NULL, and consume the
allocation in set_cpu_sibling_map() (or maybe even better in
smp_store_cpu_info() right after the identify_cpu() call) if
socket_cpumask[socket] is NULL.

And then you test this on an affected system, and submit
asap, so we can preferably avoid reverting the whole series.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.