Xen project Mailing List

Re: [PATCH v3 3/3] xen/sched: fix cpu hotplug

To: Juergen Gross <jgross@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

Date: Thu, 1 Sep 2022 12:01:04 +0000

Accept-language: en-GB, en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2qQyrUN58VB4D7oh6mQvBrz96Uv4XuRbLDDol7BAjOE=; b=RwdWYcx8Dtdt3oHFfFcKRMe2D/HdfQd2CL662a3axKNmiw/oe5U7gNdKc7LH0IUntMQ4u/Fx+cKVIN93JNORSQHpHjfTLga0E+Q9m7hxNxEhgXY+y+epjMx2LblmUapoGZOmbmvngCd4RSPNm1Li5Dw660QlO/NIdP1d3eZa44UNpZRycx+uXuJltx1uhv3gbwz/TE6lzcEiRZp1Fys+8b7pBnbWO7gGoCyfKuz6oDDKVs6beQ4bDYM+aKtZ5uMBghYtAZtQgHka90uOh3bPzJf876wHGSwBtMmbUZBF8Hzf6Wl5nMHx5ydgd1FA7Ih51JSbfS8EY8bIyHx6HE/iWg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EiZ9aPvoQ6+2BR85nE1wfJgPjAUrXDjo14DrUG/zfV4liscqKFo+wG+pW/LsvKDqaO44Zfeb2lkQSOgoCehGmBLV3mEIJqgRDPrD+73ndbcUUnP+jpR3UfNe9WJuLMlkYQ0+2H1dReiXMj88PmEcklwhr+rNhQfESXtGnIRFAEcX2qSnb4Zb/up3hs0p0+sIJ30SeEZgdvUy712bHo+Bh/tL7ufUrZiHrd6mCuxMpPqd58vydxXG/rVy1wtLaMtJOL8P7tfO3tSQ5j12gc6AbB1WAjJYaWOJp9kSI08JV3VIFo1pNC1169AkWPnfOhT3jwxU90lA1cUA5t1RItDZUw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Gao Ruifeng <ruifeng.gao@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Thu, 01 Sep 2022 12:01:25 +0000

Ironport-data: A9a23:qPAhfqrILtAaE4U+GW+m6jt7WO1eBmLRZBIvgKrLsJaIsI4StFCzt garIBnSb/zcZmLxfI0iPNi280MPvMKEn4BmGQRlrng0ES1GopuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefSAOKU5NfsYkhZXRVjRDoqlSVtkus4hp8AqdWiCkaGt MiaT/f3YTdJ4BYpdDNPg06/gEk35q6q6WtB5gVWic1j5zcyqVFEVPrzGonpR5fIatE8NvK3Q e/F0Ia48gvxl/v6Ior4+lpTWhRiro/6ZWBiuFIPM0SRqkEqShgJ+rQ6LJIhhXJ/0F1lqTzTJ OJl7vRcQS9xVkHFdX90vxNwS0mSNoUekFPLzOTWXWV+ACQqflO1q8iCAn3aMqU++/1bWWhM1 sA1dgBQfyyqmu+d8O6kH7wEasQLdKEHPas5k1Q5l3TzK6ZjRprOBaLX+dVfwTE8wNhUGurTb NYYbjwpawncZxpIOREcD5dWcOWA3yGjNWEH7g/I4/NouQA/zyQouFTpGPPTdsaHWoN+mUGAq 3id12/4HgsbJJqUzj/tHneE2b6SwXKrB996+LuQpttAiXLIn3UpMhBRUFfi/Kef2xbjcocKQ 6AT0m90xUQoz2SpQcP6RAaQu2Ofs1gXXN84O/037kSBx7TZ5y6dB3MYVXhRZdo+rsg0SDc2k FiTkLvBCTJmv7KUTnac3qyJtj70Mi8QRUcYeC4KQA0Kpdbqp6kyiA7CSpBoF6vdpt//FCz0w juKhDMjnLhVhskOv5hX5njCijOo45LPHgg841yNWnr/t1wjIom4e4av9F7Xq+5aK5qURUWAu 35CnNWC6OcJDteGkynlrPgxIYxFLs2taFX06WOD1bF4n9hx0xZPpbxt3Qw=

Ironport-hdrordr: A9a23:LuQQJKCCkL9vm1PlHegPsceALOsnbusQ8zAXPh9KJCC9I/bzqy nxpp8mPEfP+U0ssHFJo6HiBEEZKUmsuKKdkrNhR4tKOzOW9FdATbsSp7cKpgeNJ8SQzJ876U 4NSclD4ZjLfCBHZKXBkUaF+rQbsb+6GcmT7I+woUuFDzsaEp2IhD0JaDpzZ3cGIDWucqBJca Z0iPAmmxOQPVAsKuirDHgMWObO4/fRkoj9XBIADxk7rCGTkDKB8tfBYlml9yZbdwkK7aYp8G DDnQC8zL6kqeuHxhjV0HKWx4hKmeHm1sBICKW3+4gow3TX+0WVjbZaKvi/VQMO0aWSAZER4Z 7xSiIbToZOArXqDyeISFXWqlDdOX0VmgLfIBej8AfeSIrCNXwH4oN69PxkmlGy0TtegPhslK 1MxG6XrJxREFfJmzn8/cHBU1VwmlOzumdKq59bs5Vza/poVFZql/1owGpFVJMbWC7q4oEuF+ djSMna+fZNaFufK3TUpHNmztCgVmk6Wk7ueDlIhuWFlzxN2HxpxUoRw8IS2n8G6ZImUpFBo+ DJKL5hmr1CRtIfKah9GOACS82qDXGle2OFDEuCZVD8UK0XMXPErJD6pL0z+eGxYZQNiIA/nZ zQOWkowVLau3iefPFm8Kc7giwlGl/NLAgF4vsulKRRq/n7WKfhNzGFRRQnj9agys9vcPHmZw ==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHYsVjUGULgew6mPUO+TQcIZn3wJK3JthgAgAB6fQCAAGG9gA==

Thread-topic: [PATCH v3 3/3] xen/sched: fix cpu hotplug

On 01/09/2022 07:11, Juergen Gross wrote: > On 01.09.22 00:52, Andrew Cooper wrote: >> On 16/08/2022 11:13, Juergen Gross wrote: >>> Cpu cpu unplugging is calling schedule_cpu_rm() via stop_machine_run() >> >> Cpu cpu. >> >>> with interrupts disabled, thus any memory allocation or freeing must >>> be avoided. >>> >>> Since commit 5047cd1d5dea ("xen/common: Use enhanced >>> ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced >>> via an assertion, which will now fail. >>> >>> Before that commit cpu unplugging in normal configurations was working >>> just by chance as only the cpu performing schedule_cpu_rm() was doing >>> active work. With core scheduling enabled, however, failures could >>> result from memory allocations not being properly propagated to other >>> cpus' TLBs. >> >> This isn't accurate, is it? The problem with initiating a TLB flush >> with IRQs disabled is that you can deadlock against a remote CPU which >> is waiting for you to enable IRQs first to take a TLB flush IPI. > > As long as only one cpu is trying to allocate/free memory during the > stop_machine_run() action the deadlock won't happen. > >> How does a memory allocation out of the xenheap result in a TLB flush? >> Even with split heaps, you're only potentially allocating into a new >> slot which was unused... > > Yeah, you are right. The main problem would occur only when a virtual > address is changed to point at another physical address, which should be > quite unlikely. > > I can drop that paragraph, as it doesn't really help. Yeah, I think that would be best. >> >>> diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c >>> index 58e082eb4c..2506861e4f 100644 >>> --- a/xen/common/sched/cpupool.c >>> +++ b/xen/common/sched/cpupool.c >>> @@ -411,22 +411,28 @@ int cpupool_move_domain(struct domain *d, >>> struct cpupool *c) >>> } >>> /* Update affinities of all domains in a cpupool. */ >>> -static void cpupool_update_node_affinity(const struct cpupool *c) >>> +static void cpupool_update_node_affinity(const struct cpupool *c, >>> + struct affinity_masks *masks) >>> { >>> - struct affinity_masks masks; >>> + struct affinity_masks local_masks; >>> struct domain *d; >>> - if ( !update_node_aff_alloc(&masks) ) >>> - return; >>> + if ( !masks ) >>> + { >>> + if ( !update_node_aff_alloc(&local_masks) ) >>> + return; >>> + masks = &local_masks; >>> + } >>> rcu_read_lock(&domlist_read_lock); >>> for_each_domain_in_cpupool(d, c) >>> - domain_update_node_aff(d, &masks); >>> + domain_update_node_aff(d, masks); >>> rcu_read_unlock(&domlist_read_lock); >>> - update_node_aff_free(&masks); >>> + if ( masks == &local_masks ) >>> + update_node_aff_free(masks); >>> } >>> /* >> >> Why do we need this at all? domain_update_node_aff() already knows what >> to do when passed NULL, so this seems like an awfully complicated no-op. > > You do realize that update_node_aff_free() will do something in case > masks > was initially NULL? By "this", I meant the entire hunk, not just the final if(). What is wrong with passing the (possibly NULL) masks pointer straight down into domain_update_node_aff() and removing all the memory allocation in this function? But I've also answered that by looking more clearly. It's about not repeating the memory allocations/freeing for each domain in the pool. Which does make sense as this is a slow path already, but the complexity here of having conditionally allocated masks is far from simple. > >> >>> @@ -1008,10 +1016,21 @@ static int cf_check cpu_callback( >>> { >>> unsigned int cpu = (unsigned long)hcpu; >>> int rc = 0; >>> + static struct cpu_rm_data *mem; >>> switch ( action ) >>> { >>> case CPU_DOWN_FAILED: >>> + if ( system_state <= SYS_STATE_active ) >>> + { >>> + if ( mem ) >>> + { >> >> So, this does compile (and indeed I've tested the result), but I can't >> see how it should. >> >> mem is guaranteed to be uninitialised at this point, and ... > > ... it is defined as "static", so it is clearly NULL initially. Oh, so it is. That is hiding particularly well in plain sight. Can it please be this: @@ -1014,9 +1014,10 @@ void cf_check dump_runq(unsigned char key) static int cf_check cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { + static struct cpu_rm_data *mem; /* Protected by cpu_add_remove_lock */ + unsigned int cpu = (unsigned long)hcpu; int rc = 0; - static struct cpu_rm_data *mem; switch ( action ) { We already split static and non-static variable like this elsewhere for clarity, and identifying the lock which protects the data is useful for anyone coming to this in the future. ~Andrew P.S. as an observation, this isn't safe for parallel AP booting, but I guarantee that this isn't the only example which would want fixing if we wanted to get parallel booting working.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.