Xen project Mailing List

Re: [PATCH v3 18/23] xen/riscv: implement IRQ routing for device passthrough

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Fri, 26 Jun 2026 08:10:02 +0200

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Romain Caritey <Romain.Caritey@xxxxxxxxxxxxx>, Alistair Francis <alistair.francis@xxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 26 Jun 2026 06:10:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.06.2026 17:54, Oleksii Kurochko wrote: > > > On 6/25/26 1:14 PM, Jan Beulich wrote: >> On 25.06.2026 11:48, Oleksii Kurochko wrote: >>> On 6/25/26 8:08 AM, Jan Beulich wrote: >>>> On 24.06.2026 17:21, Oleksii Kurochko wrote: >>>>> On 6/22/26 5:57 PM, Jan Beulich wrote: >>>>>> On 17.06.2026 13:17, Oleksii Kurochko wrote: >>>>>>> --- a/xen/arch/riscv/include/asm/intc.h >>>>>>> +++ b/xen/arch/riscv/include/asm/intc.h >>>>>>> @@ -13,6 +13,7 @@ enum intc_version { >>>>>>> }; >>>>>>> >>>>>>> struct cpu_user_regs; >>>>>>> +struct domain; >>>>>>> struct irq_desc; >>>>>>> struct kernel_info; >>>>>>> struct vcpu; >>>>>>> @@ -32,6 +33,9 @@ struct intc_hw_operations { >>>>>>> /* hw_irq_controller to enable/disable/eoi host irq */ >>>>>>> const struct hw_interrupt_type *host_irq_type; >>>>>>> >>>>>>> + /* hw_irq_controller to enable/disable/eoi guest irq */ >>>>>>> + const struct hw_interrupt_type *guest_irq_type; >>>>>> >>>>>> It's likely my limited RISC-V knowledge that I find this extremely odd: >>>>>> Separate struct hw_interrupt_type-s for host and guest? >>>>> >>>>> The guest and host interrupt controllers may handle some >>>>> hw_irq_controller operations differently, even though the operations >>>>> themselves are conceptually the same. The hw_irq_controller interface >>>>> provides fairly abstract interrupt controller operations, but the >>>>> underlying implementation may differ depending on whether the controller >>>>> is used by the host or a guest. >>>>> >>>>> As an example, the Arm code already follows this approach: >>>>> >>>>> /* XXX different for level vs edge */ >>>>> static hw_irq_controller gicv2_host_irq_type = { >>>>> .typename = "gic-v2", >>>>> .startup = gicv2_irq_startup, >>>>> .shutdown = gicv2_irq_shutdown, >>>>> .enable = gicv2_irq_enable, >>>>> .disable = gicv2_irq_disable, >>>>> .ack = gicv2_irq_ack, >>>>> .end = gicv2_host_irq_end, >>>>> .set_affinity = gicv2_irq_set_affinity, >>>>> }; >>>>> >>>>> static hw_irq_controller gicv2_guest_irq_type = { >>>>> .typename = "gic-v2", >>>>> .startup = gicv2_irq_startup, >>>>> .shutdown = gicv2_irq_shutdown, >>>>> .enable = gicv2_irq_enable, >>>>> .disable = gicv2_irq_disable, >>>>> .ack = gicv2_irq_ack, >>>>> .end = gicv2_guest_irq_end, >>>>> .set_affinity = gicv2_irq_set_affinity, >>>>> }; >>>>> >>>>> These implementations reuse almost all interrupt controller operations, >>>>> differing only in the .end callback. >>>> >>>> Which I'm having trouble with as well. Interrupts are handled by Xen. What >>>> guests get to see are virtualized interrupts (no matter how much HW >>>> acceleration may be in use). Hence I'm having difficulty to see such a >>>> split justified. >>> >>> I think that I don't fully understand what is wrong with splitting. If >>> there are cases exist when I need such separation for virtual interrupt >>> controller operations then it looks fine to introduce such separation, >>> right? >>> >>> Lets take an example of PLIC. >>> >>> For each source the PLIC has a "gateway": >>> 1. Claim (read CONTEXT_CLAIM): returns the pending IRQ id and closes the >>> gateway for that source, it will not forward that source to any context >>> again until completed. >>> 2. Complete (write the id back to CONTEXT_CLAIM): reopens the gateway. >>> If the device line is still asserted (level high), the PLIC immediately >>> re-marks it pending and delivers it again. >>> >>> The "closed gateway" between claim and complete is effectively the >>> hardware masking the source while it's being serviced. >>> >>> Then if we will handle guest interrupt in the following way: >>> 1. Passthrough device asserts its line (level stays high). >>> 2. Xen takes the physical IRQ, claims (gateway closes), completes >>> (gateway reopens), injects a virtual IRQ into the guest's vPLIC. >>> 3. The guest hasn't run yet, it hasn't touched the device's registers, >>> so the device line is still high. >>> 4. The PLIC sees the source still asserted with an open gateway -> marks >>> pending -> fires another physical interrupt into Xen -> ... -> repeat. >>> >>> So we get a storm of physical interrupts for a device the guest hasn't >>> even begun servicing. The device line only drops when the guest driver >>> writes the device's own registers, which happens long after, and on the >>> guest's schedule. >>> >>> So the solution is that the physical complete must wait until the guest >>> has actually quiesced the device. The only signal Xen gets for "guest is >>> done" is the guest writing its virtual complete to the emulated vPLIC. So: >>> 1. guest_irq->ack: the claim already happened (the readl(CONTEXT_CLAIM) >>> in plic_handle_interrupt); ack just records which context claimed it. >>> The gateway stays closed - good, the source is masked while the guest works. >>> 2. inject vIRQ → guest services the device (line drops) -> guest writes >>> vPLIC complete. >>> 3. guest_irq->end: now do the physical complete, reopening the gateway. >>> Device is quiet -> no spurious re-trigger; if it's a new legitimate >>> assertion, it fires once, correctly. >>> >>> Is it clear enough now? >> >> Well, yes and no. On x86 we have to deal with the situation you describe as >> problematic anyway, as IRQs have priorities associated with them, and higher >> prio ones block equal/lower prio ones until they are "completed" (in the >> terminology you use). > > Just for my understand what is the problem here that until "completed" > isn't done for this high priority interrupt all other will just wait so > basically responsiveness of the system in general will be bad? Yes, on guest can affect other guests or the host. >> If you don't have anything similar in RISC-V, then >> you may indeed get somewhat simpler code overall with such a split. > > IIUC, if the word "block" above is used correctly I would say that > behavior on RISC-V is different, at least, for PLIC as basically, if we > have three IRQs and let's say `irq1` has the highest priority. > > `irq2` and `irq3` may become pending in the PLIC core, but they will not > be visible to the CPU until `irq1` is CLAIMed, even if `irq1` is never > completed (i.e., if you fail to write back to the CLAIM/COMPLETE register). > > When the hart reads the CLAIM/COMPLETE register, the PLIC core > atomically retrieves the ID of the highest-priority pending interrupt > (`irq1`) and clears its Interrupt Pending (IP) bit in the PLIC core. > > Once the IP bit for `irq1` is cleared, the PLIC core immediately > re-evaluates all remaining pending interrupts. If `irq2` and `irq3` are > pending, `irq2` (the next-highest-priority interrupt) becomes the > highest-priority pending interrupt. > > The PLIC core will continue to signal the hart (by asserting the `MEIP` > or `SEIP` bits) as long as there is any pending and enabled interrupt > whose priority exceeds the hart's threshold. > > So the IRQ handler can run for irq2 and irq3 before irq1 is COMPLETED. > > So irqs are blocked only until they are claimed. > > Yet if >> there's nothing like that in RISC-V, you can get (almost) arbitrarily deeply >> nested interrupts, which in turn would be a problem you need to deal with. >> IOW I suspect the architecture has something to limit nesting depth. > > The trap handler, where the IRQ handler is called, starts with > interrupts disabled, so nested interrupts cannot really occur at that point. Same on x86. Yet then in do_IRQ(), around invoking the handler, we re-enable interrupts. There have been discussions whether this is a good idea, but fundamentally the thought behind this is to prevent higher priority IRQs to remain blocked for overly long periods of time. I.e. again a responsiveness concern, the more that some of the IPIs are hi-prio ones in order for them to be serviced quickly, to prevent blocking the CPU issuing the IPI (plus perhaps further CPUs). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.