[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Help] Trigger Watchdog when adding an IPI in vcpu_wake



On Sat, 2016-09-17 at 00:31 +0000, Wei Yang wrote:
> On Fri, Sep 16, 2016 at 06:07:08PM +0200, Dario Faggioli wrote:
> > But then again, if the system is not oversubscribed, I'd tend to
> > think
> > it to be tolerable, and I'd expect the biggest problem to be the
> > work-
> > stealing logic (considering the high number of pcpus), rather than
> > the
> > duration of the critical sections within vcpu_wake().
> > 
> Yes, we are trying to improve the stealing part too.
> 
Great.

> Sounds reasonable, vcpu_wake() is O(1) while "stealing" is O(N) in
> terms of
> #PCPUs.
> 
Exactly!

> > I also think current upstream is a bit better, especially because
> > it's
> > mostly me that made it look the way it does. :-D
> 
> Ah, not intended to flattering you.
> 
EhEh! Sure, I was joking myself. :-)

> > 
> > 
> > But I was actually describing how 4.1 works. In fact, in 4.1, if
> > pcpu 6
> > is idle (se the '//xxx xxx xxx' comments I'm adding to the code
> > excerpts:
> > 
> >     if ( new->pri > cur->pri )  //is true, so we put pcpu 6 in mask
> >     {
> >         cpu_set(cpu, mask);
> >     }
> >     if ( cur->pri > CSCHED_PRI_IDLE )  //is false!!
> >     {
> >        ....
> >     }
> >     if ( !cpus_empty(mask) ) //the mask contains pcpu 6 only
> >         cpumask_raise_softirq(mask, SCHEDULE_SOFTIRQ);
> > 
> 
> Hmm... don't have the code at hand, while looks you are right. I
> misunderstood
> the code.

> So only one idler would be tickled even there would be several idlers
> in the
> system. I thought we would tickle several idlers, which may introduce
> some
> contention between them.
> 
You can ask Xen to tickle more than one idle, by means of that
parameter. But again, tickling idlers --either one or many-- would only
happen if there's actual need for it.

> BTW, any idea behind the cycle_cpu()? If this is about the locality,
> how about
> cycle within the node? and cycle within the node where v->processor
> is?
> 
Cycle is there as a form of load spreading, i.e., we don't want to risk
tickling always the same set of pcpus, or always the ones with the
lower cpu IDs.

You're right that taking locality into account could be a good thing in
big systems. No, that is not done, not in 4.1 nor in upstream, and it
may be something actually worth trying (although, again, it's probably
unrelated to your issue).

> > And again, I'd personally spend some more time --even by
> > temporarily
> > and hackishly instrumenting the code-- to understand better where
> > the
> > bottleneck is.
> > 
> 
> Hmm... usually we use xentrace and lock profile to identify the
> bottleneck,
> other method you would like to suggest? and instrumenting the code
> means to
> add some log in code?
> 
xentrace and lock profile is all I use too, and there's not much else,
without touching the code. And in fact, yes, with "instrumenting the
code" I means temporary changing the code to display the information
you need.

In this specific case, rather than adding printks here and there
(which, e.g., is what you usually do for debugging crashes or alike),
I'd see about modifying a little bit either xentrace or lock profiling
code (or both!) to make them provide the info you need.

Sorry, but I don't think I can be much more specific without going
looking at the code and actually trying to do that...

> > E.g., something that has proven effective for others (and for which
> > I've got an unfinished and never submitted patch to upstream), is
> > keeping track of what pcpus actually have at least 1 vcpu in their
> > runqueues (i.e., at least 1 vcpus that is runnable and not running)
> > and, inside balance_load(), only consider those when looking for
> > work
> > to steal.
> 
> Looks like we keep a cpumask with those who have 1 more vcpus and
> during
> stealing process, just scan those instead of the online cpumask.
> 
Yep, that sounds like a plan, and was along the lines of what I was
doing. If you want to give it a try, patches (for upstream, of course
:-D) are welcome. :-P

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.