[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Notes on stubdoms and latency on ARM
Hello Dario, On 20 June 2017 at 13:11, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote: >> On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@xxxxxxxxxx> >> wrote: >> > True. However, Volodymyr took the time to demonstrate the >> > performance of >> > EL0 apps vs. stubdoms with a PoC, which is much more than most Xen >> > contributors do. Nodoby provided numbers for a faster ARM context >> > switch >> > yet. I don't know on whom should fall the burden of proving that a >> > lighter context switch can match the EL0 app numbers. I am not sure >> > it >> > would be fair to ask Volodymyr to do it. >> >> Thanks. Actually, we discussed this topic internally today. Main >> concern today is not a SMCs and OP-TEE (I will be happy to do this >> right in XEN), but vcopros and GPU virtualization. Because of legal >> issues, we can't put this in XEN. And because of vcpu framework >> nature >> we will need multiple calls to vgpu driver per one vcpu context >> switch. >> I'm going to create worst case scenario, where multiple vcpu are >> active and there are no free pcpu, to see how credit or credit2 >> scheduler will call my stubdom. >> > Well, that would be interesting and useful, thanks for offering doing > that. Yeah, so I did that. And I have get some puzzling results. I don't know why, but when I have 4 (or less) active vcpus on 4 pcpus, my test takes about 1 second to execute. But if there are 5 (or mode) active vcpus on 4 pcpus, it executes from 80 to 110 seconds. There will be the details, but first let me remind you my setup. I'm testing on ARM64 machine with 4 Cortex A57 cores. I wrote special test driver for linux, that calls SMC instruction 100 000 times. Also I hacked miniOS to act as monitor for DomU. This means that XEN traps SMC invocation and asks MiniOS to handle this. So, every SMC is handled in this way: DomU->XEN->MiniOS->XEN->DomU. Now, let's get back to results. ** Case 1: - Dom0 has 4 vcpus and is idle - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. I run test in DomU: root@salvator-x-h3-xt:~# time -p cat /proc/smc_bench Will call SMC 100000 time(s) Done! real 1.10 user 0.00 sys 1.10 ** Case 2: - Dom0 has 4 vcpus. They all are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 6 vcpus active I run test in DomU: real 113.08 user 0.00 sys 113.04 ** Case 3: - Dom0 has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 5 vcpus active I run test in DomU: real 88.55 user 0.00 sys 88.54 ** Case 4: - Dom0 has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 4 vcpus active I run test in DomU: real 1.11 user 0.00 sys 1.11 ** Case 5: - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 5 vcpus active I run test in DomU: real 100.96 user 0.00 sys 100.94 ** Case 6: - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 4 vcpus active I run test in DomU: real 1.11 user 0.00 sys 1.10 * Case 7 - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - *Minios is running on separate cpu pool with 1 pcpu*: Name CPUs Sched Active Domain count Pool-0 3 credit y 2 minios 1 credit y 1 I run test in DomU: real 1.11 user 0.00 sys 1.10 * Case 8 - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - Minios is running on separate cpu pool with 1 pcpu: I run test in DomU: real 100.12 user 0.00 sys 100.11 As you can see, I tried to move minios to separate cpu pool. But it didn't helped a lot. Name ID Mem VCPUs State Time(s) Cpupool Domain-0 0 752 4 r----- 1566.1 Pool-0 DomU 1 255 4 -b---- 4535.1 Pool-0 mini-os 2 128 1 r----- 2395.7 minios I expected that it would be 20% to 50% slower, when there are more vCPUs than pCPUs. But it is 100 times slower and I can't explain this. Probably, something is very broken in my XEN. But I used 4.9 with some hacks to make minios work. I didn't touched scheduler at all. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@xxxxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |