[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 06/39] ARM: evtchn: Handle level triggered IRQs correctly



Hi,

On 28/03/18 18:46, Stefano Stabellini wrote:
> On Wed, 28 Mar 2018, Andre Przywara wrote:
>> On 28/03/18 01:01, Stefano Stabellini wrote:
>>> On Wed, 21 Mar 2018, Andre Przywara wrote:
>>>> The event channel IRQ has level triggered semantics, however the current
>>>> VGIC treats everything as edge triggered.
>>>> To correctly process those IRQs, we have to lower the (virtual) IRQ line
>>>> at some point in time, depending on whether ther interrupt condition
>>>> still prevails.
>>>> Check the per-VCPU evtchn_upcall_pending variable to make the interrupt
>>>> line match its status, and call this function upon every hypervisor
>>>> entry.
>>>>
>>>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxxxxx>
>>>> Reviewed-by: Julien Grall <julien.grall@xxxxxxx>
>>>> ---
>>>>  xen/arch/arm/domain.c       | 7 +++++++
>>>>  xen/arch/arm/traps.c        | 1 +
>>>>  xen/include/asm-arm/event.h | 1 +
>>>>  3 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>>> index ff97f2bc76..9688e62f78 100644
>>>> --- a/xen/arch/arm/domain.c
>>>> +++ b/xen/arch/arm/domain.c
>>>> @@ -953,6 +953,13 @@ void vcpu_mark_events_pending(struct vcpu *v)
>>>>      vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, true);
>>>>  }
>>>>  
>>>> +void vcpu_update_evtchn_irq(struct vcpu *v)
>>>> +{
>>>> +    bool pending = vcpu_info(v, evtchn_upcall_pending);
>>>> +
>>>> +    vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, pending);
>>>> +}
>>>> +
>>>>  /* The ARM spec declares that even if local irqs are masked in
>>>>   * the CPSR register, an irq should wake up a cpu from WFI anyway.
>>>>   * For this reason we need to check for irqs that need delivery,
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>> index 2638446693..5c18e918b0 100644
>>>> --- a/xen/arch/arm/traps.c
>>>> +++ b/xen/arch/arm/traps.c
>>>> @@ -2033,6 +2033,7 @@ static void enter_hypervisor_head(struct 
>>>> cpu_user_regs *regs)
>>>>           * trap and how it can be optimised.
>>>>           */
>>>>          vtimer_update_irqs(current);
>>>> +        vcpu_update_evtchn_irq(current);
>>>>  #endif
>>>
>>> I am replying to this patch, even though I have already committed it, to
>>> point out a problem with the way we currently handle the evtchn_irq in
>>> this series.
>>>
>>> The short version is that I think we should configure the PPI
>>> corresponding to the evtchn_irq as EDGE instead of LEVEL.
>>
>> Well, that's really a separate problem, then. We can't just configure
>> the PPI at will, it has to match the device semantic.
>> When writing this patch, I checked how the the evtchn "device" is
>> implemented, and it screams "level IRQ" to me:
>> - We have a flag (evtchn_upcall_pending), which stores the current
>> interrupt state.
>> - This flag gets set by the producer when the interrupt condition is true.
>> - It gets cleared by the *consumer* once it has handled the request.
>>
>> So if the event channel mechanism should be edge (which would be fair
>> enough), we need to change the code to implement this: the interrupt
>> condition should be cleared once we *injected* the IRQ - and not only
>> when the consumer has signalled completion.
>>
>> Another thing to consider: by the spec the *configurability* of PPIs is
>> implementation defined. The KVM implementation chose to fix all of them
>> to "level", which we need for the arch timer. So setting the evtchn PPI
>> to edge would be ignored. We could deviate from that, but I need to
>> check what the side effects are.
>>
>>> The long explanation follows, please correct me if I am wrong.
>>>
>>> 1) vcpuA/cpuA is running, it has already handled the event, cleared
>>> evtchn_upcall_pending and EOIed the event_irq but hasn't trapped into
>>> Xen yet. It is still in guest mode.
>>>
>>> 2) Xen on cpuB calls vcpu_mark_events_pending(vcpuA), then calls
>>> vgic_inject_irq. However, because irq->line_level is high, it is not
>>> injected.

I was just wondering if we could do something similar to the SBSA UART
change, where we amend vcpu_mark_events_pending() to update the line
level in the VGIC via vgic_inject_irq().

>> So this is a case where we fail to sync in time on the actual emulated
>> line level.

So to explain this:
The virtual line level is not in sync all of the time, as we don't get
signalled when the flag is cleared by the domain. We need to sync this
whenever needed, which, for once, is when the domain returns to the
hypervisor. Another point is this vcpu_mark_events_pending() here,
because we need to know the state. And fortunately we just queried it
also. So I will include a change which updates this into the new VGIC.

 KVM recently gained some nice code to solve this: We can
>> register per-IRQ functions that return the line level. For hardware
>> mapped IRQs this queries the distributor, but for the arch timer for
>> instance it just uses a shortcut to read CNTV_CTL_EL0.
>> The evtchn IRQ could just check evtchn_upcall_pending.
>>
>> I can take a look at a follow up patch to implement this.
> 
> I agree that the evtchn_upcall_pending mechanism is very similar to a
> level interrupt, however the mechanism to bring the notification to the
> guest is edge, even on x86.

How so?
In xen/arch/arm/domain.c I find:
> void vcpu_mark_events_pending(struct vcpu *v)
> {
>     int already_pending = test_and_set_bit(
>         0, (unsigned long *)&vcpu_info(v, evtchn_upcall_pending));
>
>     if ( already_pending )
>         return;

That smells very much like level to me.

>
>     vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, true);
> }


> Even with the new vgic implementation it
> falls very naturally in the edge pattern of behaviors. This is one of
> those cases where I would be happy to deviate from the KVM
> implementation, because it makes sense and would be easier to maintain
> going forward. We can even get rid of vcpu_update_evtchn_irq.

So I tried this:
- remove vcpu_update_evtchn_irq()
- change type to edge in the hypervisor node of the DT
- allow changing the edge/level configuration of PPIs
- (optionally: ) remove the special vcpu_mark_events_pending() handling
(the one quoted above)

This boots Dom0, and I see the event IRQ being edge now.
But it hangs at various places when booting DomUs, both with and without
the last point applied.

So from all I can see the event channel IRQ is really a level IRQ:
- We have  a flag holding the *state*: edge IRQs are actually *events*.
- We set that flag in the hypervisor, but clear it in the domains.
- We don't inject new IRQs if that flag is still set.

A true edge IRQ mechanism would hand that event over to the interrupt
controller and then forget about it.

So I would like to *not* change this away from level this at this point:
- I don't have a working implementation of using edge IRQs.
- The existing level mechanism has shown no problems in weeks of testing.
- It's somewhat straining the architecture to allow configurable PPIs.

> I am OK with a not-ideal short term fix by the end of this week, as long
> as we come up with a proper fix later, by the release date. But in this
> instance, wouldn't be enough to change the PPI type to EDGE, remove
> vcpu_update_evtchn_irq, and be done with it?

Apparently not, sorry.

Cheers,
Andre.

>>> 3) vcpuA has to wait until trapping into Xen, calling
>>> vcpu_update_evtchn_irq, and going back to guest mode before receiving
>>> the event. This is theoretically a very long time.
>>>
>>>
>>> Instead what should happen is:
>>>
>>> 1) vcpuA/cpuA is running, it has already handled the event, cleared
>>> evtchn_upcall_pending and EOIed the event_irq but hasn't trapped into
>>> Xen yet. It is still in guest mode.
>>>
>>> 2) Xen on cpuB calls vcpu_mark_events_pending(vcpuA), then calls
>>> vgic_inject_irq, which calls vgic_queue_irq_unlock that
>>> vcpu_kick(vcpuA), forcing it to take the event immediately.
>>>
>>> Am I right? Wouldn't it be safer to continue configuring the evtchn_irq
>>> as edge even in the new vgic?
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.