[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: A credit scheduler issue

Keir, Emmanuel,
  Thanks for the detailed answers, and your views. I agree that my small
change should not affect correctness. I didn't see the migration often,
I saw the dom0 vcpus migrations happening 4, 5 times from boot to start
of xend. I think we should avoid these migrations; why waste the cache
    How solid is the credit scheduler now for DomUs on a SMP box? On
32bit, PAE & 64bit? It would be a useful data point for me to debug the
HVM guest issues with the credit scheduler.

Thanks & Regards,
Open Source Technology Center, Intel Corp

>-----Original Message-----
>From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx]
>Sent: Friday, June 30, 2006 3:46 AM
>To: Kamble, Nitin A
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Keir Fraser; Ian Pratt
>Subject: Re: A credit scheduler issue
>Hi Nitin,
>On Thu, Jun 29, 2006 at 06:13:51PM -0700, Kamble, Nitin A wrote:
>>        I am trying to debug the credit scheduler to solve the many
>>    instability issues we have found with the credit scheduler.
>Great. As Keir pointed out though the problems you are seeing
>may not actually be in the credit scheduler itself.
>>        While debugging I notice an odd behavior; When running on a 2
>>    system, dom0 gets 2 vcpus by default. And even if there are no
>>    domains running in the system,  the dom0 vcpus are getting
migrated to
>>    different pcpus in the load balance. I think it is due to the
>>    happening in the credit scheduler; and it is not necessary and is
>>    wasteful to move vcpus when no of vcpus in the system are equal to
>>    pcpus.
>>        I would like to know your thinking about this behavior. Is it
>>    intended in the design?
>This should be very rare. If a VCPU were woken up and put on
>the runq of an idle CPU, a peer physical CPU that is in the
>scheduler code at that exact time could potentially pick up
>the just woken up VCPU.
>We can do things to shorten this window, like not pick up a
>VCPU from a remote CPU that is currently idle and therefore
>probably racing with us to run said newly woken up VCPU on
>its runq. But I'm not sure this happens frequently enough to
>warrant the added complexity. On top of that, it seems to
>me this is more likely to happen to VCPUs that aren't doing
>very much work and therefore would not suffer a performance
>loss from migrating physical CPU on occasion.
>Are you seeing a lot of these migrations?
>>    I added this small fix to the scheduler to fix this behavior. And
>>    I see the stability of Xen improved. Win2003 boot was crashing
>>    unhandled MMIO error on xen64 earlier with credit scheduler. I am
>>    seeing that crash with this small fix anymore. It is quiet
>>    there are more bugs I need to catch for HVM domains in the credit
>>    scheduler. And I would like to know your thoughts for this change.
>I don't agree with this change.
>When a VCPU is the only member of a CPU's runq, it's still
>waiting for a _running_ VCPU to yield or block. We should
>absolutely be picking up such a VCPU to run elsewhere on
>an idle CPU. Else, you'd end up with two VCPUs time-slicing
>on a processor while other processors in the system are idle.
>Your change effectively turns off migration on systems where
>the number of active VCPUs is less than 2 multiplied by the
>number of physical CPUs. I can see why that would hide any
>bugs in the context migrating paths, but that doesn't make
>it right. :-)
>>    csched_runq_steal(struct csched_pcpu *spc, int cpu, int pri)
>>    {
>>        struct list_head *iter;
>>        struct csched_vcpu *speer;
>>        struct vcpu *vc;
>>        /* If there are only 1 vcpu in the queue then stealing it from
>>    queue
>>         * is not going not help in load balancing.
>>         */
>>        if (spc->runq.next->next == &spc->runq)
>>                return NULL;
>>    Thanks & Regards,
>>    Nitin
>>    Open Source Technology Center, Intel Corp

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.