[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Credit scheduler and latencies



Hi Milan,

This is interesting data.

As you noted, the credit scheduler runs 30ms time slices
by default. It will however preempt the CPU for a VCPU
which is waking up and isn't consuming its fair share of
CPU resources (as calculated by the proportional weighted
method). The idea is to give good performance for many
standard workloads without requiring manual tuning.

I'm quite surprised that you managed to get one of three
CPU hogs to get more than 33.3% of the CPU! This is not
expected behaviour. I'll look into it.

It is however expected behaviour that once a VCPU consumes
its fair share of CPU resources, it will no longer preempt
others and will have to wait its turn for a time slice.
If we didn't do that, the VCPU in question could just hog
the CPU.

The way to increase the share a VCPU can use and still
preempt others when waking up is to up the fair share of
the domain in question to make sure that it's constantly
using less than its fair share of the CPU. But then, this
domain will have the ability to actually use that many
CPU resources.

The credit scheduler doesn't have a good mechanism to
guarantee a sub ms wake-to-run latency for VCPUs that it
must also restrict the CPU usage of. The assumption is
that if you require good wake-to-run latencies, then you
are not a CPU hog. This assumption may not be valid in
all workloads.

Short of recompiling source, there is no currently no
way to change the default time slice I'm affraid. And if
you recompile, you're indeed exploring uncharted territory.
Caps aren't what you're looking for. They limit the total
CPU a domain can actually get ahold of regardless of the
availability of idle resources but VCPUs still run 30ms
time slices.

Are you trying to guarantee wake-to-run latencies for
one or more domains which also hog CPU resources if left
to run unchecked?

In 3.0.4, you could try to use SEDF which basically seems
to run 1ms time slices. I can also add whatever mechanisms
you require to the credit scheduler but depending on what
is required, that may not happen for a while, and likely
not in 3.0.4.

Cheers,
Emmanuel.

On Thu, Dec 14, 2006 at 06:24:43PM +0100, Milan Holz?pfel wrote:
> Hello,
> 
> I'm currently testing Xen-3.0.4-rc1 and its new credit scheduler how it
> can fit my latency needs.  For me, latencies up to 1-5 ms are ok.
> Latencies < 1 ms would be better, which I implemented with the bvt
> scheduler and a quantum of .2 ms so far.
> 
> My test setup is as follows: Xen running on a single-core
> Athlon64 3000+ and reachable via 192.168.1.34.  Three domUs on
> 192.168.1.35, .36 and .37.  Two of the domUs are always spinning
> (python -c "while True: pass") and the third is idle.  If not mentioned
> otherwise, all have the default weight of 256 and no cap.
> 
> 
> First of all, I find it interesting that VCPUs are rescheduled after 30
> ms when the PCPU is under full load, but if a domain doesn't use much
> PCPU, then the credit scheduler will happily interrupt the
> currently-running domain almost whenever needed, e.g. at an interval of
> 5 ms:
> 
> | ping -c 500  -i .005 192.168.1.34
> | ...
> | --- 192.168.1.34 ping statistics ---
> | 500 packets transmitted, 500 received, 0% packet loss, time 2495ms
> | rtt min/avg/max/mdev = 0.055/0.062/2.605/0.113 ms
> 
> (dom0 is idle and pinged, as described above, two spinning and one idle
> domUs)
> 
> Average response time is 0.062 ms, medium deviation is 0.113 ms.
> 
> In this light, my current plans to force the scheduler to reschedule
> more often (as formerly with bvt; see below) don't seem that bad to
> me :)
> 
> 
> Next, I checked out how ping latencies to dom0 depend on dom0's cpu
> usage.  I used a script which sleeps and then tries to spinn for a
> certain amount of time (based on wall clock).  These are the results:
> 
> | dom0          sleep (ms)   spin (ms)  ping avg (ms)   ping mdev (ms)
> | idle          -            -           0.099           0.024
> | idle          -            -           0.091           0.029
> | idle          -            -           0.087           0.031
> | 25%   (.2)     4            1          0.084           0.026
> | 25%   (.2)     8            2          0.084           0.026
> | 25%   (.2)     40           10         0.088           0.030
> 
> | 38%   (.3)     1.5          3.5        0.084           0.025
> 
> | 44%   (.35)    1.75         3.75       0.075           0.023
> | 44%   (.35)    3.5          6.5        0.271           1.445
> | 30%   (.35)   17.5         32.5        6.685          14.633
> 
> | 34.5% (.4)     2            3         11.003          17.638
> 
> | 45.6% (.9)     0.2          1.8        0.111           0.238
> 
> | === domain0 with weight 3072, capped @.2 ===
> | 25%   (.2)     4            1          0.101           0.031
> | 20%   (.2)    40           10         10.698          18.643
> | 36%   (.3)     1.5          3.5        0.061           0.713
> 
> The first column shows the CPU usage reported by xentop and the amount
> of time the script was spinning.  Next is the length of one sleeping
> and one spinning interval, followed by the latency results.  (Measured
> with ping -i .2 192.168.1.34 -c 120)
> 
> As it seems, a domain/VCPU can in some cases use more than it's fair
> share of PCPU and still interrupt other VCPUs if it only sleeps
> frequently enough.
> 
> If a domain/VCPU spins for a long enough amount of time, it does indeed
> not interrupt other VCPUs anymore, with direct effects upon the latency
> I measured.
> 
> The results with capping enabled are also interesting (may use more
> than CAP if sleeping frequently enough) but not a solution for my
> needs.
> 
> 
> Therefore I will try reducing the rescheduling interval from 30 ms to
> 10 ms (should be possible?) and 1 ms (may break the credit accounting
> code completely?  I haven't totally understood in which way it needs
> the timer interrupt).
> 
> I'd be happy about any advice :)
> 
> 
> Regards,
> Milan
> 
> PS:  Would it easily possible to use bvt with 3.0.4-rc1?  I know it has
> been dropped...



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.