[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] credit scheduler error rates as reported by HP and UCSD

I've been looking at the credit scheduler in light of the paper "Resource Allocation Challenges in Virtual Machine Based IT Environments."

I've got an observation and three questions.

My first observation is that the credit scheduler will select a vcpu
that has exceeded its credit when there is no other work to be done on
any of the other physical cpus in the system.

You can verify this by looking at the last couple of lines of the
function csched_load_balance in xen/common/sched_credit.c:

   /* Failed to find more important work elsewhere... */
   return snext;

where snext is the vcpu that is over its credit for the current time
So now a question: Is this the expected or desired behaviour of the
credit scheduler? I would assume so. Why idle vcpu when there is no
contention for resources and work to be done by that vcpu?

In light of the paper, with very low allocation targets for vcpus, it
is not surprising that the positive allocation errors can be quite
large. It is also not surprising that the errors (and error
distribution) decrease with larger allocation targets.
None of this explains the negative allocation errors, where the vcpu's
received less than their pcpu allotments. I speculate that a couple of
circumstances may contribute to negative allocation errors:

very low weights attached to domains will cause the credit scheduler
to attempt to pause vcpus almost every accounting cycle. vcpus may
therefore not have as many opportunities to run as frequently as
possible. If the ALERT measument method is different, or has a
different interval, than the credit schedulers 10ms tick and 30ms
accounting cycle, negative errors may result in the view of ALERT.
I/O activity: if ALERT performans I/O activity the test, even though
it is "cpu intensive" may cause domu to block on dom0 frequently,
meaning it will idle more, especially if dom0 has a low credit

Questions: how does ALERT measure actual cpu allocation? Using Xenmon?
How does the ALERT exersize the domain? The paper didn't mention the
actual system calls and hypercalls the domains are making when running


Mike D. Day
Virtualization Architect and Sr. Technical Staff Member, IBM LTC
Cell: 919 412-3900
ST: mdday@xxxxxxxxxx | AIM: ncmikeday | Yahoo IM: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.