[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Notes on stubdoms and latency on ARM



Hello Dario,

On 20 June 2017 at 13:11, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
> On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote:
>> On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@xxxxxxxxxx>
>> wrote:
>> > True. However, Volodymyr took the time to demonstrate the
>> > performance of
>> > EL0 apps vs. stubdoms with a PoC, which is much more than most Xen
>> > contributors do. Nodoby provided numbers for a faster ARM context
>> > switch
>> > yet. I don't know on whom should fall the burden of proving that a
>> > lighter context switch can match the EL0 app numbers. I am not sure
>> > it
>> > would be fair to ask Volodymyr to do it.
>>
>> Thanks. Actually, we discussed this topic internally today. Main
>> concern today is not a SMCs and OP-TEE (I will be happy to do this
>> right in XEN), but vcopros and GPU virtualization. Because of legal
>> issues, we can't put this in XEN. And because of vcpu framework
>> nature
>> we will need multiple calls to vgpu driver per one vcpu context
>> switch.
>> I'm going to create worst case scenario, where multiple vcpu are
>> active and there are no free pcpu, to see how credit or credit2
>> scheduler will call my stubdom.
>>
> Well, that would be interesting and useful, thanks for offering doing
> that.
Yeah, so I did that. And I have get some puzzling results. I don't know why,
but when I have 4 (or less) active vcpus on 4 pcpus, my test  takes
about 1 second to execute.
But if there are 5 (or mode) active vcpus on 4 pcpus, it executes from
80 to 110 seconds.

There will be the details, but first let me remind you my setup.
 I'm testing on ARM64 machine with 4 Cortex A57 cores. I wrote
special test driver for linux, that calls SMC instruction 100 000 times.
Also I hacked miniOS to act as monitor for DomU. This means that
XEN traps SMC invocation and asks MiniOS to handle this.
So, every SMC is handled in this way:

DomU->XEN->MiniOS->XEN->DomU.

Now, let's get back to results.

** Case 1:
- Dom0 has 4 vcpus and is idle
- DomU has 4 vcpus and is idle
- Minios has 1 vcpu and is not idle, because it's scheduler does
not calls WFI.
I run test in DomU:

root@salvator-x-h3-xt:~# time -p cat /proc/smc_bench
Will call SMC 100000 time(s)
Done!
real 1.10
user 0.00
sys 1.10


** Case 2:
- Dom0 has 4 vcpus. They all are executing endless loop with sh oneliner:
# while : ; do : ; done &
- DomU has 4 vcpus and is idle
- Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- In total there are 6 vcpus active

I run test in DomU:
real 113.08
user 0.00
sys 113.04

** Case 3:
- Dom0 has 4 vcpus. Three of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- DomU has 4 vcpus and is idle
- Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- In total there are 5 vcpus active

I run test in DomU:
real 88.55
user 0.00
sys 88.54

** Case 4:
- Dom0 has 4 vcpus. Two of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- DomU has 4 vcpus and is idle
- Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- In total there are 4 vcpus active

I run test in DomU:
real 1.11
user 0.00
sys 1.11

** Case 5:
- Dom0 has 4 vcpus and is idle.
- DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- In total there are 5 vcpus active
I run test in DomU:

real 100.96
user 0.00
sys 100.94

** Case 6:
- Dom0 has 4 vcpus and is idle.
- DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- In total there are 4 vcpus active

I run test in DomU:
real 1.11
user 0.00
sys 1.10

* Case 7
- Dom0 has 4 vcpus and is idle.
- DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- *Minios is running on separate cpu pool with 1 pcpu*:
Name               CPUs   Sched     Active   Domain count
Pool-0               3    credit       y          2
minios               1    credit       y          1

I run test in DomU:
real 1.11
user 0.00
sys 1.10

* Case 8
- Dom0 has 4 vcpus and is idle.
- DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner:
# while : ; do : ; done &
- Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI.
- Minios is running on separate cpu pool with 1 pcpu:

I run test in DomU:
real 100.12
user 0.00
sys 100.11


As you can see, I tried to move minios to separate cpu pool. But it
didn't helped a lot.

Name                                        ID   Mem VCPUs State
Time(s)         Cpupool
Domain-0                                     0   752     4     r-----
  1566.1          Pool-0
DomU                                         1   255     4     -b----
  4535.1          Pool-0
mini-os                                      2   128     1     r-----
  2395.7          minios


I expected that it would be 20% to 50% slower, when there are more
vCPUs than pCPUs. But it is 100 times slower and I can't explain this.
Probably, something is very broken in my XEN. But I used 4.9 with some
hacks to make minios work. I didn't touched scheduler at all.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@xxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.