[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/2] Introduce runstate area registration with phys address
Hi, On 23/04/2019 09:10, Andrii Anisov wrote: From: Andrii Anisov <andrii_anisov@xxxxxxxx> Following discussion [1] it is introduced and implemented a runstate registration interface which uses guest's phys address instead of a virtual one. The new hypercall employes the same data structures as a predecessor, but expects the vcpu_runstate_info structure to not cross a page boundary. The interface is implemented in a way vcpu_runstate_info structure is mapped to the hypervisor on the hypercall processing and is directly accessed during its updates. This runstate area mapping follows vcpu_info structure registration. Permanent mapping of runstate area would consume vmap area on arm32 what is limited to 1G. Though it is assumed that ARM32 does not target the server market and the rest of possible applications will not host a huge number of VCPUs to render the limitation into the issue. I am afraid I can't possible back this assumption. As I pointed out in the previous version, I would be OK with the always map solution on Arm32 (pending performance) because it would be possible to increase the virtual address area by reworking the address space. The series is tested for ARM64. Build tested for x86. I'd appreciate if someone could check it with x86. The Linux kernel patch is here [2]. Though it is for 4.14. The patch looks wrong to me. You are using virt_to_phys() on a percpu area. What does actually promise you the physical address will always be the same? Changes in: v2: It was reconsidered the new runstate interface implementation. The new interface is made independent of the old one. Do not share runstate_area field, and consequently avoid excessive concurrency with the old runstate interface usage. Introduced locks in order to resolve possible concurrency between runstate area registration and usage. Addressed comments from Jan Beulich [3][4] about coding style nits. Though some of them become obsolete with refactoring and few are picked into this thread for further discussion. There were made performance measurements of approaches (runstate mapped on access vs mapped on registration). The test setups are as following:Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. InDomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2 is ran with different resolutions in order to emit different irq load, where 320x240 emits high IRQ load, but 1920x1080 emits low irq load. Separately tested baking DomD benchmark run with primitive Dom0 CPU burn (dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than Are you saying that the command dd is the CPUBurn? I am not sure how this could be considered as a CPUBurn. IHMO, this is more IO related. VCPU(dX)->idle->VCPU(dX). with following results: mapped mapped on access on init GLMark2 320x240 2852 2877 +0.8% +Dom0 CPUBurn 2088 2094 +0.2% GLMark2 800x600 2368 2375 +0.3% +Dom0 CPUBurn 1868 1921 +2.8% GLMark2 1920x1080 931 931 0% +Dom0 CPUBurn 892 894 +0.2% Please note that "mapped on access" means using the old runstate registering interface. And runstate update in this case still often fails to map runstate area like [5], despite the fact that our Linux kernel does not have KPTI enabled. So runstate area update, in this case, is really shortened. We know that the old interface is broken, so telling us the new interface is faster is not entirely useful. What I am more interested is how it if you use a guest physical address on the version "Mapped on access". Also it was checked IRQ latency difference using TBM in a setup similar to [5]. Please note that the IRQ rate is one in 30 seconds, and only VCPU->idle->VCPU use-case is considered. With following results (in ns, the timer granularity 120ns): How long did you run the benchmark? mapped on access: max=9960 warm_max=8640 min=7200 avg=7626 mapped on init: max=9480 warm_max=8400 min=7080 avg=7341 Unfortunately there are no consitent results yet from profiling using Lauterbach PowerTrace. Still in communication with the tracer vendor in order to setup the proper configuration. [1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html [2] https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29 [3] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00936.html [4] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00934.html [5] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02369.html [6] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02297.html Andrii Anisov (2): xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall xen: implement VCPUOP_register_runstate_phys_memory_area xen/arch/arm/domain.c | 62 +++++++++++++++++-------- xen/arch/x86/domain.c | 105 +++++++++++++++++++++++++++++++------------ xen/common/domain.c | 81 +++++++++++++++++++++++++++++++++ xen/include/asm-arm/domain.h | 2 + xen/include/public/vcpu.h | 15 +++++++ xen/include/xen/domain.h | 2 + xen/include/xen/sched.h | 8 ++++ 7 files changed, 227 insertions(+), 48 deletions(-) Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |