[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen VMs and Unixbench: single vs multiple cpu behaviour

[Cc-ing George, which I should have done earlier, sorry! :-/]

On Sat, 2015-11-21 at 11:41 +0100, Marko ÄukiÄ wrote:
> And the results of vcpu pinning:
So, let me see if I can put the numbers together and recap.

With a 4 vCPUs VM, we have:
                  no pinning / all on 1 pCPU / 1-to-1 pin
Dhrystone 2 using register variablesÂÂ3355.0       3359.4          3385.2
Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂ 787.6        785.3           784.2
Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 298.8        193.0           303.7
File Copy 1024 bufsize 2000 maxblocks 3292.7       3303.1          3294.0
File Copy 256 bufsize 500 maxblocksÂÂÂ2078.2       2089.2          2083.3
File Copy 4096 bufsize 8000 maxblocks 5516.9       5559.8          5576.7
Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1855.9       1857.8          1856.1
Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂÂ999.9        987.6           999.5
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 254.4        826.4           354.1
Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ818.0        840.1           815.8
Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂ6493.1       1100.4          6497.7
System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2870.2       2866.0          2847.9
System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂ1564.2       1438.5          1611.2

With a 1 vCPU VM, the _same_ benchmarks behaves like this:

                  no pinning / pinned on 1 pCPU
Dhrystone 2 using register variablesÂÂ3403.6         3391.0
Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂÂ785.5          786.4
Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1853.5         1857.8
File Copy 1024 bufsize 2000 maxblocksÂ3909.4         3901.2
File Copy 256 bufsize 500 maxblocksÂÂÂ2468.3         2459.8
File Copy 4096 bufsize 8000 maxblocksÂ6212.0         6191.4
Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2079.8         2080.8
Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂ1101.3         1100.9
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1811.4         1877.4
Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂ3084.2         3054.2
Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂ2838.6         2816.9
System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ3511.4         3517.4
System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂ2407.4         2409.6

It looks to me that numbers are pretty much the same, no matter
pinning. Considering that this is 'sequential workload' (as only one
copy of any single benchmark runs at any given time), I think this
makes sense.

The most notable exception to the above statement is, in the 4 vCPUs
case, "Shell Scripts (8 concurrent)", which is sensibly slower if all
the vCPUs are pinned to 1 pCPU. That also makes sense, though, as this
is the only one _not_really_ sequential test.

There are other differences, still in the 4 vCPUs case:
Â- "Execl Throughput" slows down a bit in "all vCPUs pinned to 1 pCPU"Â
Â- "Process Creation", quite weirdly, is boosted in the "all vCPUsÂ
ÂÂÂpinned to 1 pCPU" case, and behaves worst in the "no pinning case".

So, it looks to me that the Xen (Credit1, I assume, is that correct
Marko?) *per* *se* is doing ok. Still, things do slow down in the case
where the VM, basically, should have 3 idle vCPUs. I'd be tempted to
say that it could be one of those Xen-vs-Linux's schedulers
(mis)interactions, and it must be a Xen specific one, as Marko reported
that KVM --despite being slightly worse, in general-- is not affected
by this particular glitch.

I'm a bit clueless for now, so I'll go keeping trying to reproduce this
and, as soon as I manage to, collect some traces.

I'll let you know...

<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.