[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
On Wed, 2015-03-18 at 15:26 +0000, George Dunlap wrote: > On 03/18/2015 08:53 AM, Dario Faggioli wrote: > >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, > >>> int cpu) > >>> static void * > >>> csched2_alloc_pdata(const struct scheduler *ops, int cpu) > >>> { > >>> - /* Check to see if the cpu is online yet */ > >>> - /* Note: cpu 0 doesn't get a STARTING callback */ > >>> - if ( cpu == 0 || cpu_to_socket(cpu) >= 0 ) > >>> + /* > >>> + * Actual initialization is deferred to when the pCPU will be > >>> + * online, via a STARTING callback. The only exception is > >>> + * the boot cpu, which does not get such a notification, and > >>> + * hence needs to be taken care of here. > >>> + */ > >>> + if ( system_state == SYS_STATE_boot ) > >>> init_pcpu(ops, cpu); > > I think this will break adding a cpu to a cpupool after boot, won't it? > Oh, yeah, that. Yes, I knew it, and probably should have mentioned it... I assumed it was clear enough that I was asking an opinion about the shown use of system_state, as opposed to using a particular init value of cpu_data.phys_proc_id to the same effect, during the boot process. > Don't we want effectively, "if ( is_boot_cpu() || > is_cpu_topology_set_up() )"? > > is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I > suppose? > That is what I was thinking. Alternatively, I thought we can keep the '<= SYS_STATE_boot' part for recognizing that the call comes from within the boot process, and actually do tweak the initialization/assignment logic of phys_proc_id, as it was also mentioned previously in the thread (and as you also mention below, IIUIC), to make it better represent whether topology info are valid or not. > Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test? > It is a valid test for that, right now, AFAICT, but Jan would like for it to stop being one. :-D > So let's step back for a minute, and let me see if I can describe the > situation. > Yeah... thanks for this, it's been extremely helpful, IMO! > Unlike the other schedulers, credit2 needs to know the topology of a > cpus when setting it up, so that we can place it in the proper runqueue. > > Setting up the per-cpu structures would normally happen in alloc_pdata. > This is called in three different ways: > > * During boot, on the boot processer, alloc_pdata is called from > scheduler_init(). > > * During boot, on the secondary processors, alloc_pdata is called from > a CPU_UP_PREPARE callback, which happens just before the cpu is actually > brought up. > > * When switching a cpu to a new cpupool after boot, alloc_pdata is also > called from cpu_schedule_switch(). > > The "normal" place the cpu topology information can be found is in > global the cpu_data[] array, typically accessed by the cpu_to_socket() > or cpu_to_core() macros. > > This topology information is written to cpu_data[] by > smp_store_cpu_info(). For the boot cpu, it happens in > smp_prepare_cpus(); For secondary cpus, it's happens in > start_secondary(), which is the code run on the cpu itself as it's being > brought up. > > Unfortunately, at the moment, both of these places are after the > respective places where the alloc pdata is called for those cpus. > Flattening the entire x86 setup at the moment you'd see: > alloc_pdata for boot cpu > smp_store_cpu_info for boot cpu > for each secondary cpu > alloc_pdata for secondary cpu > smp_store_cpu_info for secondary cpu > > scheduler_init() is called before smp_prepare_cpus() in part because of > a dependency chain elsewhere: we cannot set up the idle domain until > scheduler_init() is called; and other places further on in the > initialization but before setting up cpu topology information assume > that the idle domain has been set up. > However, as you say yourself below, there is boot_cpu_data, which stashes the info we need for the boot cpu. It gets filled after init_idle_domain()->scheduler_init() right now, but I don't see a reason why it can't be pulled up a bit (and from my tests, it seems to work). That happens during system_state==SYS_STATE_boot, which also means system_state<SYS_STATE_smp_boot. This means that, as far as scheduling initialization is concerned, if (system_state < SYS_STATE_smp_boot) we can and should use boot_cpu_data from the scheduler code, because it's for that cpu that we're being called. I do agree that this may look fragile, as it depends on the relative positioning of identify_cpu() wrt scheduler_init() wrt system_state values, but I think we can find something to ASSERT() about (from Credit2's code). > I haven't actually tracked down this dependency yet; nor have I explored > the possibility of populating cpu_data[] for the boot cpu earier than > it's done now. > That looks like what boot_cpu_data is for, AFAICT. > I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather > than CPU_STARTING -- whether that's just what seemed most sensible when > cpupools were created, or whether there's a dependency somewhere between > those two points. > Me neither, this may be worth a try. > At the moment we deal with the fact that the topology information for a > CPU is not available on CPU_UP_PREPARE by setting a callback that will > call init_pcpu() on the CPU_STARTING action. > > The boot cpu will not get such a callback; but information about the > topology of the boot cpu *is* available in boot_cpu_data[]. > Exactly. > We can use system_state to detect whether we've been called from > scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've > been called after the whole thing has been done (system_state >= > SYS_STATE_active). > Exactly again. :-) > But while system_state == SYS_STATE_smp_boot (which > is where both of the normal schedule callback and the csched2-specific > callback happen at the moment), we can't determine whether a given cpu's > cpu_data[] has been initialized yet. > But we know this from the fact that, if system_state is SYS_STATE_smp_boot and not SYS_STATE_boot, we are being called from the callback, don't we? > So one thing we could do is this: > > * In csched2_alloc_pdata(), call init_pcpu() if system_state <= > SYS_STATE_smp_boot (to catch scheduler_init) or system_state >= > SYS_STATE_active (to catch cpupool callbacks). > > * Keep the credit2-specific callback on CPU_STARTING. In > csched2_cpu_starting(), call init_pcpu(). (Not sure if this handles cpu > offlining / onlining properly or not.) > > * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that > we've been called from scheduler_init and use boot_cpu_data[]. > Otherwise, assume that the calls from the CPU_UP_PREPARE callback have > been filtered out, and that we're being called either from CPU_STARTING > or from cpu_schedlue_switch(), and use cpu_data[]. > Oh, right, so after all, we at least sort of are on the same page: this is exactly what I was suggesting/wanted to do. :-) > Another thing we could explore is this: > > * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than > CPU_UP_PREPARE, and get rid of the csched2-specific callback. > > * In csched2_alloc_pdata, always call init_pcpu(). (Or perhaps just > rename init_pcpu() to csched2_alloc_pdata().) > > * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've > been called from scheduler_init and use boot_cpu_data[]; otherwise, > assume that we're being called either from CPU_STARTING, or from > cpu_schedule_switch(), and use cpu_data[]. > Right too, I think. > If this would work, I think it would be a lot better. It would remove > the credit2-specific callback, and it would mean we wouldn't need an > additional test to filter out > I see. Ok, I guess I can give this a try. If it does not explode, because some weird dependency we can't see right now, we can either try harder to track it, or use the other solution outlined above as a fallback. > In both cases there's a slight risk in using system_state to determine > whether to rely on cpu_data[] or not, because there's actually a window > for each processor after system_state == SYS_STATE_smp_boot where > cpu_data[] is *not* initialized, but it's not obvious from looking at > the data itself. If the callback mechanisms ever change order with the > cpu_data[] initializations in the future, we risk a situation where > credit2 silently regresses to using a single massive runqueue. > > So I think that at least for debug builds, we should make some way to > determine whether a given cpu_data[] has been initialized, so that at > least we an ASSERT() that it has. > Yep, agreed too. :-) Thanks again for this and Regards, Dario Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |