Xen project Mailing List

Re: IRQ latency measurements in hypervisor

From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Date: Fri, 15 Jan 2021 15:45:43 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2/wcs8q9dAsXPmgFnOjP7mzcJbvBdzEFD9zJ37YOJbM=; b=Fu6r7GluAdf6gk6dW9JRS6tuAcQVu/phrQGJlLslx1jO4tUUtCMXLQxh6guqgLcRKSgUEAdnZnedoLIxHAj9U83xenLT24b7SrLTUxki+7AuLGrInTW86yX8xFty8wpTui6T7LCoki3uYBvvlM7mYlypiTSfdBuCZRV1fv1CVgta7aq8jtCPTxYMJNkekijpgYH2/2R+ZNIF4fWOnvmM8pN4WV9rhFcCZ9z3uhCZCRTGJ/tTAmElXn/SAOrFsYx9y8S9OZ22yEuQEgQdOwC15wNptCAq2hE9yQM8ENm+RWVzdR+sJWjLG4120jFuR5hePKw9R6ajw7gBZYFSOj5gIw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cn1Pj4xT27jbevD2L/loq1brcr2AT7cRMGMw6x8USM/EhTilNBwUpUYAUae6va8obtqqV0bXqUx1RkIdxqNKER5qk0Q4LeSPGa7UgmnV5LZELUfORxL3bsvAD/SA9GQ6j5xrmQ2ddfZjKLqqpzqeUDR0qlWxphNNPWGQWQlSadAYH+XeGlA3U34e15sOvsWnmE2ePhB5oZUQbLGc0ZlO3wiN/sG0Ki/78jwoxljq/5s04VKZ8gA+y/UKtNoqb5FtD/JWc99ohnck6K+5krA+Opjt6xbW0WdkbSGhj0bRuUabscttxfxzIxE1uaE/kIqhr6fzCmHGj8Z6+bQxYA65Bw==

Authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=epam.com;

Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Julien Grall <jgrall@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxx>, "Bertrand.Marquis@xxxxxxx" <Bertrand.Marquis@xxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Fri, 15 Jan 2021 15:45:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHW6T1firbO0EOYpEKUiW6vsavsPaonyNIAgADL04CAAEPVAA==

Thread-topic: IRQ latency measurements in hypervisor

Hi Julien, Julien Grall writes: > Hi Volodymyr, Stefano, > > On 14/01/2021 23:33, Stefano Stabellini wrote: >> + Bertrand, Andrew (see comment on alloc_heap_pages()) > > Long running hypercalls are usually considered security issues. > > In this case, only the control domain can issue large memory > allocation (2GB at a time). Guest, would only be able to allocate 2MB > at the time, so from the numbers below, it would only take 1ms max. > > So I think we are fine here. Next time, you find a large loop, please > provide an explanation why they are not security issues (e.g. cannot > be used by guests) or send an email to the Security Team in doubt. Sure. In this case I took into account that only control domain can issue this call, I just didn't stated this explicitly. Next time will do. >>> ARMv8 platform. Namely Renesas Rcar H3 SoC on Salvator board. > > Which core is it? Cortex A57 [...] >> 2. RTDS scheduler. With console disabled, things like "hexdump -v >>> /dev/zero" didn't affected the latency so badly, but anyways, >>> sometimes I got ~600us spikes. This is not a surprise, because of >>> default RTDS configuration. I changed period for DomU from default >>> 10ms to 100us and things got better: with Dom0 burning CPU I am >>> rarely getting max latency of about ~30us with mean latency of ~9us >>> and deviation of ~0.5us. On other hand, when I tried to set period >>> to 30us, max latency rose up to ~60us. > In a related topic, I am not entirely sure that all the hypercalls > would be able to fit in the 100us slice. In particular, the one which > are touching the P2M and do memory allocation. I agree with you. In my experiments I didn't found a case with long running hypercall (apart from mentioned populate_physmap), but of course there should be cases with such calls. > >> This is very interestingi too. Did you get any spikes with the >> period >> set to 100us? It would be fantastic if there were none. >> >>> 3. Huge latency spike during domain creation. I conducted some >>> additional tests, including use of PV drivers, but this didn't >>> affected the latency in my "real time" domain. But attempt to >>> create another domain with relatively large memory size of 2GB led >>> to huge spike in latency. Debugging led to this call path: >>> >>> XENMEM_populate_physmap -> populate_physmap() -> >>> alloc_domheap_pages() -> alloc_heap_pages()-> huge >>> "for ( i = 0; i < (1 << order); i++ )" loop. > > There are two for loops in alloc_heap_pages() using this syntax. Which > one are your referring to? I did some tracing with Lautrebach. It pointed to the first loop and especially to flush_page_to_ram() call if I remember correctly. >>> >>> This loops handles struct page* for every one of 262144 pages that >>> was allocated by calling populate_physmap(). > > Looking at the domain creation code, 2GB will be split in two extents > of 1GB. This means, there will be at least a preemption point between > the allocation of the two extents. Yes. 1GB is exactly 262144 4KB pages. [...] >>> I managed to overcome the issue #3 by commenting out all calls to >>> populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in >>> xg_dom_arm.c. This lengthened domain construction, but my "RT" domain >>> didn't experienced so big latency issues. Apparently all other >>> hypercalls which are used during domain creation are either fast or >>> preemptible. No doubts that my hack lead to page tables inflation and >>> overall performance drop. >> I think we need to follow this up and fix this. Maybe just by adding >> a hypercall continuation to the loop. > > When I read "hypercall continuation", I read we will return to the > guest context so it can process interrupts and potentially switch to > another task. > > This means that the guest could issue a second populate_physmap() from > the vCPU. Therefore any restart information should be part of the > hypercall parameters. So far, I don't see how this would be possible. > > Even if we overcome that part, this can be easily abuse by a guest as > the memory is not yet accounted to the domain. Imagine a guest that > never request the continuation of the populate_physmap(). So we would > need to block the vCPU until the allocation is finished. Moreover, most of the alloc_heap_pages() sits under spinlock, so first step would be to split this function into smaller atomic parts. > I think the first step is we need to figure out which part of the > allocation is slow (see my question above). From there, we can figure > out if there is a way to reduce the impact. I'll do more tracing and will return with more accurate numbers. But as far as I can see, any loop on 262144 pages will take some time... -- Volodymyr Babchuk at EPAM

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.