Xen project Mailing List

Re: [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads

From: George Dunlap <George.Dunlap@xxxxxxxxxx>

Date: Tue, 7 Apr 2020 16:16:24 +0000

Accept-language: en-GB, en-US

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=George.Dunlap@xxxxxxxxxx; spf=Pass smtp.mailfrom=George.Dunlap@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Peng Fan <peng.fan@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "maz@xxxxxxxxxx" <maz@xxxxxxxxxx>, Wei Xu <xuwei5@xxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, Julien Grall <julien.grall.oss@xxxxxxxxx>

Delivery-date: Tue, 07 Apr 2020 16:16:34 +0000

Ironport-sdr: CffjwMus1Vp0KI+QEGbbXyQ24IlQK3g927kNMSANe+TcJ6P5uGd0rbMjGpKOfaPKOfQtCXmnO2 FKjFQ3wAsqi3MMx7Pou206QYvaxf6V9e5Fx4mc5vpSl7SGS0vtGp8hEI4DtohzjEYjHUYDTdg0 y0iyaejuseEFcaBoPznq5CXZLgXFBSSrgIQpkedWtozua2NpQtPH0XQVIibiW9V0FbxLv9qYS/ iEPHj7zoXbFEVfTskzHX5zThV//insLRAlDIOex6uaiDqB+8fSwPxlFvtih02DNYu74mu8Auyh i/I=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHWCRLu2U2EJ1WUk0i1KTanKYKpWqhmDBwAgAGf9wCAAAzcgIAEjVGAgAANpgCAAWgwAA==

Thread-topic: [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads

> On Apr 6, 2020, at 7:47 PM, Julien Grall <julien@xxxxxxx> wrote: > > On 06/04/2020 18:58, George Dunlap wrote: >>> On Apr 3, 2020, at 9:27 PM, Julien Grall <julien.grall.oss@xxxxxxxxx> wrote: >>> >>> On Fri, 3 Apr 2020 at 20:41, Stefano Stabellini <sstabellini@xxxxxxxxxx> >>> wrote: >>>> >>>> On Thu, 2 Apr 2020, Julien Grall wrote: >>>>> As we discussed on Tuesday, the cost for other vCPUs is only going to be a >>>>> trap to the hypervisor and then back again. The cost is likely smaller >>>>> than >>>>> receiving and forwarding an interrupt. >>>>> >>>>> You actually agreed on this analysis. So can you enlighten me as to why >>>>> receiving an interrupt is a not problem for latency but this is? >>>> >>>> My answer was that the difference is that an operating system can >>>> disable interrupts, but it cannot disable receiving this special IPI. >>> >>> An OS can *only* disable its own interrupts, yet interrupts will still >>> be received by Xen even if the interrupts are masked at the processor >>> (e.g local_irq_disable()). >>> >>> You would need to disable interrupts one by one as the GIC level (use >>> ICENABLER) in order to not receive any interrupts. Yet, Xen may still >>> receive interrupts for operational purposes (e.g serial, console, >>> maintainance IRQ...). So trap will happen. >> I think Stefano’s assertion is that the users he has in mind will be >> configuring the system such that RT workloads get a minimum number of >> hypervisor-related interrupts possible. On a 4-core system, you could have >> non-RT workloads running on cores 0-1, and RT workloads running with the >> NULL scheduler on cores 2-3. In such a system, you’d obviously arrange that >> serial and maintenance IRQs are delivered to cores 0-1. > Well maintenance IRQs are per pCPU so you can't route to another one... > > But, I think you missed my point that local_irq_disable() from the guest will > not prevent the hypervisor to receive interrupts *even* the one routed to the > vCPU itself. They will just not be delivered to the guest context until > local_irq_enable() is called. My understanding, from Stefano was that what his customers are concerned about is the time between the time a physical IRQ is delivered to the guest and the time the guest OS can respond appropriately. The key thing here isn’t necessarily speed, but predictability — system designers need to know that, with a high probability, their interrupt routines will complete within X amount of cycles. Further interrupts delivered to a guest are not a problem in this scenario, if the guest can disable them until the critical IRQ has been handled. Xen-related IPIs, however, could potentially cause a problem if not mitigated. Consider a guest where vcpu 1 loops over the register, while vcpu 2 is handling a latency-critical IRQ. A naive implementation might send an IPI every time vcpu 1 does a read, spamming vcpu 2 with dozens of IPIs. Then an IRQ routine which reliably finishes well within the required time normally suddenly overruns and causes an issue. I don’t know what maintenance IRQs are, but if they only happen intermittently, it’s possible that you’d never get more than a single one in a latency-critical IRQ routine; and as such, the variatibility in execution time (jitter) wouldn’t be an issue in practice. But every time you add a new unblockable IPI, you increase this jitter; particularly if this unblockable IPI might be repeated an arbitrary number of times. (Stefano, let me know if I’ve misunderstood something.) So stepping back a moment, here’s all the possible ideas that I think have been discussed (or are there implicitly) so far. 1. [Default] Do nothing; guests using this register continue crashing 2. Make the I?ACTIVER registers RZWI. 3. Make I?ACTIVER return the most recent known value; i.e. KVM’s current behavior (as far as we understand it) 4. Use a simple IPI with do_noop to update I?ACTIVER 4a. Use an IPI, but come up with clever tricks to avoid interrupting guests handling IRQs. 5. Trap to Xen on guest EOI, so that we know when the 6. Some clever paravirtualized option Obviously nobody wants #1, and #3 is clearly not really an option now either. #2 is not great, but it’s simple and quick to implement for now. Julien, I’m not sure your position on this one: You rejected the idea back in v1 of this patch series, but seemed to refer to it again earlier in this thread. #4 is relatively quick to implement a “dumb” version, but such a “dumb” version has a high risk of causing unacceptable jitter (or at least, so Stefano believes). #4a or #6 are further potential lines to explore, but would require a lot of additional design to get working right. I think if I understand Stefano’s PoV, then #5 would actually be acceptable — the overall amount of time spent in the hypervisor would probably be greater, but it would be bounded and predictable: once someone got their IRQ handler working reliably, it would likely continue to work. It sounds like #5 might be pretty quick to implement; and then at some point in the future if someone wants to improve performance, they can work on 4a or 6. Any thoughts? Anything I’m missing? -George

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.