[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/3] xen/sched: fix cpu hotplug


  • To: Juergen Gross <jgross@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Thu, 1 Sep 2022 12:01:04 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2qQyrUN58VB4D7oh6mQvBrz96Uv4XuRbLDDol7BAjOE=; b=RwdWYcx8Dtdt3oHFfFcKRMe2D/HdfQd2CL662a3axKNmiw/oe5U7gNdKc7LH0IUntMQ4u/Fx+cKVIN93JNORSQHpHjfTLga0E+Q9m7hxNxEhgXY+y+epjMx2LblmUapoGZOmbmvngCd4RSPNm1Li5Dw660QlO/NIdP1d3eZa44UNpZRycx+uXuJltx1uhv3gbwz/TE6lzcEiRZp1Fys+8b7pBnbWO7gGoCyfKuz6oDDKVs6beQ4bDYM+aKtZ5uMBghYtAZtQgHka90uOh3bPzJf876wHGSwBtMmbUZBF8Hzf6Wl5nMHx5ydgd1FA7Ih51JSbfS8EY8bIyHx6HE/iWg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EiZ9aPvoQ6+2BR85nE1wfJgPjAUrXDjo14DrUG/zfV4liscqKFo+wG+pW/LsvKDqaO44Zfeb2lkQSOgoCehGmBLV3mEIJqgRDPrD+73ndbcUUnP+jpR3UfNe9WJuLMlkYQ0+2H1dReiXMj88PmEcklwhr+rNhQfESXtGnIRFAEcX2qSnb4Zb/up3hs0p0+sIJ30SeEZgdvUy712bHo+Bh/tL7ufUrZiHrd6mCuxMpPqd58vydxXG/rVy1wtLaMtJOL8P7tfO3tSQ5j12gc6AbB1WAjJYaWOJp9kSI08JV3VIFo1pNC1169AkWPnfOhT3jwxU90lA1cUA5t1RItDZUw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Gao Ruifeng <ruifeng.gao@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Thu, 01 Sep 2022 12:01:25 +0000
  • Ironport-data: A9a23:qPAhfqrILtAaE4U+GW+m6jt7WO1eBmLRZBIvgKrLsJaIsI4StFCzt garIBnSb/zcZmLxfI0iPNi280MPvMKEn4BmGQRlrng0ES1GopuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefSAOKU5NfsYkhZXRVjRDoqlSVtkus4hp8AqdWiCkaGt MiaT/f3YTdJ4BYpdDNPg06/gEk35q6q6WtB5gVWic1j5zcyqVFEVPrzGonpR5fIatE8NvK3Q e/F0Ia48gvxl/v6Ior4+lpTWhRiro/6ZWBiuFIPM0SRqkEqShgJ+rQ6LJIhhXJ/0F1lqTzTJ OJl7vRcQS9xVkHFdX90vxNwS0mSNoUekFPLzOTWXWV+ACQqflO1q8iCAn3aMqU++/1bWWhM1 sA1dgBQfyyqmu+d8O6kH7wEasQLdKEHPas5k1Q5l3TzK6ZjRprOBaLX+dVfwTE8wNhUGurTb NYYbjwpawncZxpIOREcD5dWcOWA3yGjNWEH7g/I4/NouQA/zyQouFTpGPPTdsaHWoN+mUGAq 3id12/4HgsbJJqUzj/tHneE2b6SwXKrB996+LuQpttAiXLIn3UpMhBRUFfi/Kef2xbjcocKQ 6AT0m90xUQoz2SpQcP6RAaQu2Ofs1gXXN84O/037kSBx7TZ5y6dB3MYVXhRZdo+rsg0SDc2k FiTkLvBCTJmv7KUTnac3qyJtj70Mi8QRUcYeC4KQA0Kpdbqp6kyiA7CSpBoF6vdpt//FCz0w juKhDMjnLhVhskOv5hX5njCijOo45LPHgg841yNWnr/t1wjIom4e4av9F7Xq+5aK5qURUWAu 35CnNWC6OcJDteGkynlrPgxIYxFLs2taFX06WOD1bF4n9hx0xZPpbxt3Qw=
  • Ironport-hdrordr: A9a23:LuQQJKCCkL9vm1PlHegPsceALOsnbusQ8zAXPh9KJCC9I/bzqy nxpp8mPEfP+U0ssHFJo6HiBEEZKUmsuKKdkrNhR4tKOzOW9FdATbsSp7cKpgeNJ8SQzJ876U 4NSclD4ZjLfCBHZKXBkUaF+rQbsb+6GcmT7I+woUuFDzsaEp2IhD0JaDpzZ3cGIDWucqBJca Z0iPAmmxOQPVAsKuirDHgMWObO4/fRkoj9XBIADxk7rCGTkDKB8tfBYlml9yZbdwkK7aYp8G DDnQC8zL6kqeuHxhjV0HKWx4hKmeHm1sBICKW3+4gow3TX+0WVjbZaKvi/VQMO0aWSAZER4Z 7xSiIbToZOArXqDyeISFXWqlDdOX0VmgLfIBej8AfeSIrCNXwH4oN69PxkmlGy0TtegPhslK 1MxG6XrJxREFfJmzn8/cHBU1VwmlOzumdKq59bs5Vza/poVFZql/1owGpFVJMbWC7q4oEuF+ djSMna+fZNaFufK3TUpHNmztCgVmk6Wk7ueDlIhuWFlzxN2HxpxUoRw8IS2n8G6ZImUpFBo+ DJKL5hmr1CRtIfKah9GOACS82qDXGle2OFDEuCZVD8UK0XMXPErJD6pL0z+eGxYZQNiIA/nZ zQOWkowVLau3iefPFm8Kc7giwlGl/NLAgF4vsulKRRq/n7WKfhNzGFRRQnj9agys9vcPHmZw ==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYsVjUGULgew6mPUO+TQcIZn3wJK3JthgAgAB6fQCAAGG9gA==
  • Thread-topic: [PATCH v3 3/3] xen/sched: fix cpu hotplug

On 01/09/2022 07:11, Juergen Gross wrote:
> On 01.09.22 00:52, Andrew Cooper wrote:
>> On 16/08/2022 11:13, Juergen Gross wrote:
>>> Cpu cpu unplugging is calling schedule_cpu_rm() via stop_machine_run()
>>
>> Cpu cpu.
>>
>>> with interrupts disabled, thus any memory allocation or freeing must
>>> be avoided.
>>>
>>> Since commit 5047cd1d5dea ("xen/common: Use enhanced
>>> ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced
>>> via an assertion, which will now fail.
>>>
>>> Before that commit cpu unplugging in normal configurations was working
>>> just by chance as only the cpu performing schedule_cpu_rm() was doing
>>> active work. With core scheduling enabled, however, failures could
>>> result from memory allocations not being properly propagated to other
>>> cpus' TLBs.
>>
>> This isn't accurate, is it?  The problem with initiating a TLB flush
>> with IRQs disabled is that you can deadlock against a remote CPU which
>> is waiting for you to enable IRQs first to take a TLB flush IPI.
>
> As long as only one cpu is trying to allocate/free memory during the
> stop_machine_run() action the deadlock won't happen.
>
>> How does a memory allocation out of the xenheap result in a TLB flush?
>> Even with split heaps, you're only potentially allocating into a new
>> slot which was unused...
>
> Yeah, you are right. The main problem would occur only when a virtual
> address is changed to point at another physical address, which should be
> quite unlikely.
>
> I can drop that paragraph, as it doesn't really help.

Yeah, I think that would be best.

>>
>>> diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
>>> index 58e082eb4c..2506861e4f 100644
>>> --- a/xen/common/sched/cpupool.c
>>> +++ b/xen/common/sched/cpupool.c
>>> @@ -411,22 +411,28 @@ int cpupool_move_domain(struct domain *d,
>>> struct cpupool *c)
>>>   }
>>>     /* Update affinities of all domains in a cpupool. */
>>> -static void cpupool_update_node_affinity(const struct cpupool *c)
>>> +static void cpupool_update_node_affinity(const struct cpupool *c,
>>> +                                         struct affinity_masks *masks)
>>>   {
>>> -    struct affinity_masks masks;
>>> +    struct affinity_masks local_masks;
>>>       struct domain *d;
>>>   -    if ( !update_node_aff_alloc(&masks) )
>>> -        return;
>>> +    if ( !masks )
>>> +    {
>>> +        if ( !update_node_aff_alloc(&local_masks) )
>>> +            return;
>>> +        masks = &local_masks;
>>> +    }
>>>         rcu_read_lock(&domlist_read_lock);
>>>         for_each_domain_in_cpupool(d, c)
>>> -        domain_update_node_aff(d, &masks);
>>> +        domain_update_node_aff(d, masks);
>>>         rcu_read_unlock(&domlist_read_lock);
>>>   -    update_node_aff_free(&masks);
>>> +    if ( masks == &local_masks )
>>> +        update_node_aff_free(masks);
>>>   }
>>>     /*
>>
>> Why do we need this at all?  domain_update_node_aff() already knows what
>> to do when passed NULL, so this seems like an awfully complicated no-op.
>
> You do realize that update_node_aff_free() will do something in case
> masks
> was initially NULL?

By "this", I meant the entire hunk, not just the final if().

What is wrong with passing the (possibly NULL) masks pointer straight
down into domain_update_node_aff() and removing all the memory
allocation in this function?

But I've also answered that by looking more clearly.  It's about not
repeating the memory allocations/freeing for each domain in the pool. 
Which does make sense as this is a slow path already, but the complexity
here of having conditionally allocated masks is far from simple.

>
>>
>>> @@ -1008,10 +1016,21 @@ static int cf_check cpu_callback(
>>>   {
>>>       unsigned int cpu = (unsigned long)hcpu;
>>>       int rc = 0;
>>> +    static struct cpu_rm_data *mem;
>>>         switch ( action )
>>>       {
>>>       case CPU_DOWN_FAILED:
>>> +        if ( system_state <= SYS_STATE_active )
>>> +        {
>>> +            if ( mem )
>>> +            {
>>
>> So, this does compile (and indeed I've tested the result), but I can't
>> see how it should.
>>
>> mem is guaranteed to be uninitialised at this point, and ...
>
> ... it is defined as "static", so it is clearly NULL initially.

Oh, so it is.  That is hiding particularly well in plain sight.

Can it please be this:

@@ -1014,9 +1014,10 @@ void cf_check dump_runq(unsigned char key)
 static int cf_check cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
+    static struct cpu_rm_data *mem; /* Protected by cpu_add_remove_lock */
+
     unsigned int cpu = (unsigned long)hcpu;
     int rc = 0;
-    static struct cpu_rm_data *mem;
 
     switch ( action )
     {

We already split static and non-static variable like this elsewhere for
clarity, and identifying the lock which protects the data is useful for
anyone coming to this in the future.

~Andrew

P.S. as an observation, this isn't safe for parallel AP booting, but I
guarantee that this isn't the only example which would want fixing if we
wanted to get parallel booting working.


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.