Xen project Mailing List

Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

On 03/18/2015 08:53 AM, Dario Faggioli wrote: >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, >>> int cpu) >>> static void * >>> csched2_alloc_pdata(const struct scheduler *ops, int cpu) >>> { >>> - /* Check to see if the cpu is online yet */ >>> - /* Note: cpu 0 doesn't get a STARTING callback */ >>> - if ( cpu == 0 || cpu_to_socket(cpu) >= 0 ) >>> + /* >>> + * Actual initialization is deferred to when the pCPU will be >>> + * online, via a STARTING callback. The only exception is >>> + * the boot cpu, which does not get such a notification, and >>> + * hence needs to be taken care of here. >>> + */ >>> + if ( system_state == SYS_STATE_boot ) >>> init_pcpu(ops, cpu); I think this will break adding a cpu to a cpupool after boot, won't it? Don't we want effectively, "if ( is_boot_cpu() || is_cpu_topology_set_up() )"? is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I suppose? Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test? >> >> Same here, plus the new condition at the first glance isn't matching >> the old one, but perhaps that's intentional. >> > It is intentional, and that is why we're changing it! :-) Let me try to > explain: > > The idea, in both old and new code, is to call init_pcpu() either: > a) on the boot cpu, during boot > b) on non-boot cpu, *only* if they are online. > > The issue is that, for assessing b), it checks whether > cpu_to_socket(cpu) returns >=0 and, if it does, it assumes the cpu is > online. That is not true, as cpu_data.phys_proc_id (which is what > cpu_to_socket() go reading) is 0 even before any onlining and topolog > identification has happened (it's a global). > > Therefore, all the pcpus are initialized right away, and all are > assigned to runqueue 0, as returned by cpu_to_socket() at this stage, > which is clearly wrong. > > In fact, this is the reason why, previous versions of this took the > approach of initializing things such as cpu_to_socket() returned -1 > before topology identification. > > In the new version, as you suggested, I'm using system_state to figure > out whether we are dealing with the boot cpu, and that's it. :-) > > Hope this clarifies things... If yes, I'll make sure to put a similar > explanation in the changelog, when sending the patch officially. So let's step back for a minute, and let me see if I can describe the situation. Unlike the other schedulers, credit2 needs to know the topology of a cpus when setting it up, so that we can place it in the proper runqueue. Setting up the per-cpu structures would normally happen in alloc_pdata. This is called in three different ways: * During boot, on the boot processer, alloc_pdata is called from scheduler_init(). * During boot, on the secondary processors, alloc_pdata is called from a CPU_UP_PREPARE callback, which happens just before the cpu is actually brought up. * When switching a cpu to a new cpupool after boot, alloc_pdata is also called from cpu_schedule_switch(). The "normal" place the cpu topology information can be found is in global the cpu_data[] array, typically accessed by the cpu_to_socket() or cpu_to_core() macros. This topology information is written to cpu_data[] by smp_store_cpu_info(). For the boot cpu, it happens in smp_prepare_cpus(); For secondary cpus, it's happens in start_secondary(), which is the code run on the cpu itself as it's being brought up. Unfortunately, at the moment, both of these places are after the respective places where the alloc pdata is called for those cpus. Flattening the entire x86 setup at the moment you'd see: alloc_pdata for boot cpu smp_store_cpu_info for boot cpu for each secondary cpu alloc_pdata for secondary cpu smp_store_cpu_info for secondary cpu scheduler_init() is called before smp_prepare_cpus() in part because of a dependency chain elsewhere: we cannot set up the idle domain until scheduler_init() is called; and other places further on in the initialization but before setting up cpu topology information assume that the idle domain has been set up. I haven't actually tracked down this dependency yet; nor have I explored the possibility of populating cpu_data[] for the boot cpu earier than it's done now. I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather than CPU_STARTING -- whether that's just what seemed most sensible when cpupools were created, or whether there's a dependency somewhere between those two points. I *think* that there probably cannot be a dependency: no cpu boot code can (or should) depend on the cpu having been initialized, because it doesn't happen when you're booting credit2; and no scheduler code in alloc_pdata can (or should) depend on being in a pre-initialized state, because it also happens when assigning an already-initialized cpu to a cpupool. At the moment we deal with the fact that the topology information for a CPU is not available on CPU_UP_PREPARE by setting a callback that will call init_pcpu() on the CPU_STARTING action. The boot cpu will not get such a callback; but information about the topology of the boot cpu *is* available in boot_cpu_data[]. We can use system_state to detect whether we've been called from scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've been called after the whole thing has been done (system_state >= SYS_STATE_active). But while system_state == SYS_STATE_smp_boot (which is where both of the normal schedule callback and the csched2-specific callback happen at the moment), we can't determine whether a given cpu's cpu_data[] has been initialized yet. So one thing we could do is this: * In csched2_alloc_pdata(), call init_pcpu() if system_state <= SYS_STATE_smp_boot (to catch scheduler_init) or system_state >= SYS_STATE_active (to catch cpupool callbacks). * Keep the credit2-specific callback on CPU_STARTING. In csched2_cpu_starting(), call init_pcpu(). (Not sure if this handles cpu offlining / onlining properly or not.) * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that we've been called from scheduler_init and use boot_cpu_data[]. Otherwise, assume that the calls from the CPU_UP_PREPARE callback have been filtered out, and that we're being called either from CPU_STARTING or from cpu_schedlue_switch(), and use cpu_data[]. Another thing we could explore is this: * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than CPU_UP_PREPARE, and get rid of the csched2-specific callback. * In csched2_alloc_pdata, always call init_pcpu(). (Or perhaps just rename init_pcpu() to csched2_alloc_pdata().) * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've been called from scheduler_init and use boot_cpu_data[]; otherwise, assume that we're being called either from CPU_STARTING, or from cpu_schedule_switch(), and use cpu_data[]. If this would work, I think it would be a lot better. It would remove the credit2-specific callback, and it would mean we wouldn't need an additional test to filter out In both cases there's a slight risk in using system_state to determine whether to rely on cpu_data[] or not, because there's actually a window for each processor after system_state == SYS_STATE_smp_boot where cpu_data[] is *not* initialized, but it's not obvious from looking at the data itself. If the callback mechanisms ever change order with the cpu_data[] initializations in the future, we risk a situation where credit2 silently regresses to using a single massive runqueue. So I think that at least for debug builds, we should make some way to determine whether a given cpu_data[] has been initialized, so that at least we an ASSERT() that it has. Setting the values to BAD_APICID before system_state == SYS_STATE_smp_boot would be one way to do that; having an "initialized" flag in the structure (which defaults to zero) would be another. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.