[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: A credit scheduler issue

Hi Nitin,

On Thu, Jun 29, 2006 at 06:13:51PM -0700, Kamble, Nitin A wrote:
>        I am trying to debug the credit scheduler to solve the many HVM domain
>    instability issues we have found with the credit scheduler.

Great. As Keir pointed out though the problems you are seeing
may not actually be in the credit scheduler itself.

>        While debugging I notice an odd behavior; When running on a 2 CPU
>    system, dom0 gets 2 vcpus by default. And even if there are no other
>    domains running in the system,  the dom0 vcpus are getting migrated to
>    different pcpus in the load balance. I think it is due to the preemption
>    happening in the credit scheduler; and it is not necessary and is actually
>    wasteful to move vcpus when no of vcpus in the system are equal to no of
>    pcpus.
>        I would like to know your thinking about this behavior. Is it an
>    intended in the design?

This should be very rare. If a VCPU were woken up and put on
the runq of an idle CPU, a peer physical CPU that is in the
scheduler code at that exact time could potentially pick up
the just woken up VCPU.

We can do things to shorten this window, like not pick up a
VCPU from a remote CPU that is currently idle and therefore
probably racing with us to run said newly woken up VCPU on
its runq. But I'm not sure this happens frequently enough to
warrant the added complexity. On top of that, it seems to
me this is more likely to happen to VCPUs that aren't doing
very much work and therefore would not suffer a performance
loss from migrating physical CPU on occasion. 

Are you seeing a lot of these migrations?

>    I added this small fix to the scheduler to fix this behavior. And with it
>    I see the stability of Xen improved. Win2003 boot was crashing with
>    unhandled MMIO error on xen64 earlier with credit scheduler. I am not
>    seeing that crash with this small fix anymore. It is quiet possible that
>    there are more bugs I need to catch for HVM domains in the credit
>    scheduler. And I would like to know your thoughts for this change.

I don't agree with this change.

When a VCPU is the only member of a CPU's runq, it's still
waiting for a _running_ VCPU to yield or block. We should
absolutely be picking up such a VCPU to run elsewhere on
an idle CPU. Else, you'd end up with two VCPUs time-slicing
on a processor while other processors in the system are idle.

Your change effectively turns off migration on systems where
the number of active VCPUs is less than 2 multiplied by the
number of physical CPUs. I can see why that would hide any
bugs in the context migrating paths, but that doesn't make
it right. :-)

>    csched_runq_steal(struct csched_pcpu *spc, int cpu, int pri)
>    {
>        struct list_head *iter;
>        struct csched_vcpu *speer;
>        struct vcpu *vc;
>        /* If there are only 1 vcpu in the queue then stealing it from the
>    queue
>         * is not going not help in load balancing.
>         */
>        if (spc->runq.next->next == &spc->runq)
>                return NULL;
>    Thanks & Regards,
>    Nitin
> -----------------------------------------------------------------------------------
>    Open Source Technology Center, Intel Corp

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.