[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] Xen optimization
- To: Andrii Anisov <andrii.anisov@xxxxxxxxx>
- From: Julien Grall <julien.grall@xxxxxxxxx>
- Date: Mon, 10 Dec 2018 12:54:49 +0100
- Cc: nd@xxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, andrii_anisov@xxxxxxxx, Milan Boberic <milanboberic94@xxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Meng Xu <xumengpanda@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
- Delivery-date: Mon, 10 Dec 2018 11:55:09 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
(sorry for the formatting)
Hello All,
On 27.11.18 23:27, Stefano Stabellini wrote:
> See the following:
>
> https://marc.info/?l=xen-devel&m=148668817704668
So I did port that stuff to the current staging [1].
Also, the correspondent tbm, itself is here [2].
Having 4 big cores on my SoC I run XEN with the following command line:
dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none guest_loglvl=all/none
The TBM's domain configuration file is as following:
seclabel='system_u:system_r:domU_t'
name = "DomP"
kernel = "/home/root/ctest-bare.bin"
extra = "console=hvc0 rw"
memory = 128
vcpus = 1
cpus = "3"
This gives me setup where Domain-0 runs on cores 0 and 1 solely and TBM runs exclusively on core 3. So that we can rely that it shows us a pure IRQ latency of hypervisor.
My board is Renesas Salvator-X with H3 ES3.0 SoC and 8GB RAM. Generic timer runs at 8.333 MHz freq, what gives my 120ns resolution for measurements.
XEN hypervisor is build without debug and TBM does wfi in the idle loop for all experiments.
With that setup IRQ latency numbers are (in ns):
What are the numbers without Xen? Which version of Xen are you using?
Old vgic:
AVG MIN MAX WARM MAX
credit, vwfi=trap 7706 7560 9480 8400
credit, vwfi=native 2908 2880 3120 4800
credit2, vwfi=trap 7221 7200 9240 7440
credit2, vwfi=native 2906 2880 3120 5040
New vgic:
AVG MIN MAX WARM MAX
credit, vwfi=trap 8481 8040 10200 8880
credit, vwfi=native 4115 3960 4800 4200
credit2, vwfi=trap 8425 8400 9600 9000
credit2, vwfi=native 4227 3960 5040 4680
Here we can see that the new vgic underperforms the old one in a trivial use-case modeled with TBM.
The vwfi=trap does not look so bad (10%) but indeed the vwfi=native adds a bigger overhead. This also tells you that in the trap case the vGIC is not the bigger overhead.
I am pretty sure that this can be optimized because we mostly focused on reliability and specification compliance for the first draft.
So yes the old vGIC performs better but at the price of unreliability and non-compliance.
Old vgic with optimizations [3] (without [4], because it breaks the setup):
AVG MIN MAX WARM MAX
credit, vwfi=trap 7309 7080 8760 7680
credit, vwfi=native 3007 3000 4320 3120
credit2, vwfi=trap 6877 6720 8880 7200
credit2, vwfi=native 2680 2640 4440 2880
This is with all your series applied but [4], correct? Did you try to see the perfomance improvement patch by patch?
Cheers
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|