[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code



On Wed, 2015-03-18 at 15:26 +0000, George Dunlap wrote:
> On 03/18/2015 08:53 AM, Dario Faggioli wrote:
> >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, 
> >>> int cpu)
> >>>  static void *
> >>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> >>>  {
> >>> -    /* Check to see if the cpu is online yet */
> >>> -    /* Note: cpu 0 doesn't get a STARTING callback */
> >>> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> >>> +    /*
> >>> +     * Actual initialization is deferred to when the pCPU will be
> >>> +     * online, via a STARTING callback. The only exception is
> >>> +     * the boot cpu, which does not get such a notification, and
> >>> +     * hence needs to be taken care of here.
> >>> +     */
> >>> +    if ( system_state == SYS_STATE_boot )
> >>>          init_pcpu(ops, cpu);
> 
> I think this will break adding a cpu to a cpupool after boot, won't it?
> 
Oh, yeah, that. Yes, I knew it, and probably should have mentioned it...
I assumed it was clear enough that I was asking an opinion about the
shown use of system_state, as opposed to using a particular init value
of cpu_data.phys_proc_id to the same effect, during the boot process.

> Don't we want effectively, "if ( is_boot_cpu() ||
> is_cpu_topology_set_up() )"?
> 
> is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
> suppose?
> 
That is what I was thinking. Alternatively, I thought we can keep the
'<= SYS_STATE_boot' part for recognizing that the call comes from within
the boot process, and actually do tweak the initialization/assignment
logic of phys_proc_id, as it was also mentioned previously in the thread
(and as you also mention below, IIUIC), to make it better represent
whether topology info are valid or not.

> Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?
> 
It is a valid test for that, right now, AFAICT, but Jan would like for
it to stop being one. :-D

> So let's step back for a minute, and let me see if I can describe the
> situation.
> 
Yeah... thanks for this, it's been extremely helpful, IMO!

> Unlike the other schedulers, credit2 needs to know the topology of a
> cpus when setting it up, so that we can place it in the proper runqueue.
> 
> Setting up the per-cpu structures would normally happen in alloc_pdata.
>  This is called in three different ways:
> 
> * During boot, on the boot processer, alloc_pdata is called from
> scheduler_init().
> 
> * During boot, on the secondary processors, alloc_pdata is called from
> a CPU_UP_PREPARE callback, which happens just before the cpu is actually
> brought up.
> 
> * When switching a cpu to a new cpupool after boot, alloc_pdata is also
> called from cpu_schedule_switch().
> 
> The "normal" place the cpu topology information can be found is in
> global the cpu_data[] array, typically accessed by the cpu_to_socket()
> or cpu_to_core() macros.
> 
> This topology information is written to cpu_data[] by
> smp_store_cpu_info().  For the boot cpu, it happens in
> smp_prepare_cpus();  For secondary cpus, it's happens in
> start_secondary(), which is the code run on the cpu itself as it's being
> brought up.
> 
> Unfortunately, at the moment, both of these places are after the
> respective places where the alloc pdata is called for those cpus.
> Flattening the entire x86 setup at the moment you'd see:
>   alloc_pdata for boot cpu
>   smp_store_cpu_info for boot cpu
>   for each secondary cpu
>     alloc_pdata for secondary cpu
>     smp_store_cpu_info for secondary cpu
> 
> scheduler_init() is called before smp_prepare_cpus() in part because of
> a dependency chain elsewhere: we cannot set up the idle domain until
> scheduler_init() is called; and other places further on in the
> initialization but before setting up cpu topology information assume
> that the idle domain has been set up.
> 
However, as you say yourself below, there is boot_cpu_data, which
stashes the info we need for the boot cpu. It gets filled after
init_idle_domain()->scheduler_init() right now, but I don't see a reason
why it can't be pulled up a bit (and from my tests, it seems to work).

That happens during system_state==SYS_STATE_boot, which also means
system_state<SYS_STATE_smp_boot. This means that, as far as scheduling
initialization is concerned, if (system_state < SYS_STATE_smp_boot) we
can and should use boot_cpu_data from the scheduler code, because it's
for that cpu that we're being called.

I do agree that this may look fragile, as it depends on the relative
positioning of identify_cpu() wrt scheduler_init() wrt system_state
values, but I think we can find something to ASSERT() about (from
Credit2's code).

> I haven't actually tracked down this dependency yet; nor have I explored
> the possibility of populating cpu_data[] for the boot cpu earier than
> it's done now.
> 
That looks like what boot_cpu_data is for, AFAICT.

> I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
> than CPU_STARTING -- whether that's just what seemed most sensible when
> cpupools were created, or whether there's a dependency somewhere between
> those two points.
> 
Me neither, this may be worth a try.

> At the moment we deal with the fact that the topology information for a
> CPU is not available on CPU_UP_PREPARE by setting a callback that will
> call init_pcpu() on the CPU_STARTING action.
> 
> The boot cpu will not get such a callback; but information about the
> topology of the boot cpu *is* available in boot_cpu_data[].
> 
Exactly.

> We can use system_state to detect whether we've been called from
> scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've
> been called after the whole thing has been done (system_state >=
> SYS_STATE_active).  
>
Exactly again. :-)

> But while system_state == SYS_STATE_smp_boot (which
> is where both of the normal schedule callback and the csched2-specific
> callback happen at the moment), we can't determine whether a given cpu's
> cpu_data[] has been initialized yet.
> 
But we know this from the fact that, if system_state is
SYS_STATE_smp_boot and not SYS_STATE_boot, we are being called from the
callback, don't we?

> So one thing we could do is this:
> 
> * In csched2_alloc_pdata(), call init_pcpu() if system_state <=
> SYS_STATE_smp_boot (to catch scheduler_init) or system_state >=
> SYS_STATE_active (to catch cpupool callbacks).
> 
> * Keep the credit2-specific callback on CPU_STARTING.  In
> csched2_cpu_starting(), call init_pcpu().  (Not sure if this handles cpu
> offlining / onlining properly or not.)
> 
> * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that
> we've been called from scheduler_init and use boot_cpu_data[].
> Otherwise, assume that the calls from the CPU_UP_PREPARE callback have
> been filtered out, and that we're being called either from CPU_STARTING
> or from cpu_schedlue_switch(), and use cpu_data[].
> 
Oh, right, so after all, we at least sort of are on the same page: this
is exactly what I was suggesting/wanted to do. :-)

> Another thing we could explore is this:
> 
> * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than
> CPU_UP_PREPARE, and get rid of the csched2-specific callback.
> 
> * In csched2_alloc_pdata, always call init_pcpu().  (Or perhaps just
> rename init_pcpu() to csched2_alloc_pdata().)
> 
> * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've
> been called from scheduler_init and use boot_cpu_data[]; otherwise,
> assume that we're being called either from CPU_STARTING, or from
> cpu_schedule_switch(), and use cpu_data[].
> 
Right too, I think.

> If this would work, I think it would be a lot better.  It would remove
> the credit2-specific callback, and it would mean we wouldn't need an
> additional test to filter out
> 
I see. Ok, I guess I can give this a try. If it does not explode,
because some weird dependency we can't see right now, we can either try
harder to track it, or use the other solution outlined above as a
fallback.

> In both cases there's a slight risk in using system_state to determine
> whether to rely on cpu_data[] or not, because there's actually a window
> for each processor after system_state == SYS_STATE_smp_boot where
> cpu_data[] is *not* initialized, but it's not obvious from looking at
> the data itself.  If the callback mechanisms ever change order with the
> cpu_data[] initializations in the future, we risk a situation where
> credit2 silently regresses to using a single massive runqueue.
> 
> So I think that at least for debug builds, we should make some way to
> determine whether a given cpu_data[] has been initialized, so that at
> least we an ASSERT() that it has.
>
Yep, agreed too. :-)

Thanks again for this and Regards,
Dario

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.