[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] ARM: new VGIC: evtchn: fix potential race in vcpu_mark_events_pending()



On Tue, 3 Apr 2018, Julien Grall wrote:
> On 29/03/18 18:35, Stefano Stabellini wrote:
> > On Thu, 29 Mar 2018, Andre Przywara wrote:
> > > Stefano pointed out the following situation:
> > > ----------------------
> > > 1) vcpuA/cpuA is running, it has already handled the event, cleared
> > > evtchn_upcall_pending and EOIed the event_irq but hasn't trapped into
> > > Xen yet. It is still in guest mode.
> > > 
> > > 2) Xen on cpuB calls vcpu_mark_events_pending(vcpuA), then calls
> > > vgic_inject_irq. However, because irq->line_level is high, it is not
> > > injected.
> > > 
> > > 3) vcpuA has to wait until trapping into Xen, calling
> > > vcpu_update_evtchn_irq, and going back to guest mode before receiving
> > > the event. This is theoretically a very long time.
> > > ----------------------
> > > 
> > > Fix this by updating the state of our emulated IRQ line level inside
> > > vcpu_mark_events_pending(), before trying to inject the new interrupt.
> > > 
> > > Despite having two calls to vgic_inject_irq(), only one will actually do
> > > something:
> > > - If the emulated line level was already in sync with the actual flag,
> > >    the VGIC ignores the first call, due to vgic_validate_injection().
> > > - If the emulated line level was high, but the flag says it should have
> > >    been low, vgic_inject_irq() will just update the line_level state.
> > > - If the emulated line level was low, but the flags says it should have
> > >    been high, we will inject the interrupt. The second call is then a
> > >    NOP.
> > > 
> > > Signed-off-by: Andre Przywara <andre.przywara@xxxxxxxxxx>
> > > ---
> > > Hi,
> > > 
> > > this would ideally have been part of a former patch:
> > > "[PATCH v3 06/39] ARM: evtchn: Handle level triggered IRQs correctly",
> > > but this has been merged already, so this has to be a follow-up.
> > > Ideally this would be merged before the final patch that introduces the
> > > CONFIG_NEW_VGIC Kconfig symbol, so that the old code gets never compiled.
> > > 
> > > Thanks,
> > > Andre
> > > 
> > >   xen/arch/arm/domain.c | 11 +++++++++--
> > >   1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > > index 9688e62f78..11fa9002dc 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -947,10 +947,17 @@ void vcpu_mark_events_pending(struct vcpu *v)
> > >       int already_pending = test_and_set_bit(
> > >           0, (unsigned long *)&vcpu_info(v, evtchn_upcall_pending));
> > >   -    if ( already_pending )
> > > -        return;
> > > +#ifdef CONFIG_NEW_VGIC
> > > +    /* Update the state of the current interrupt line. */
> > > +    vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq,
> > > already_pending);
> > >   +    /* Make the level IRQ pending. That's a NOP if it was already. */
> > >       vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, true);
> > > +#else
> > > +    /* Only signal the VGIC if it wasn't already pending. */
> > > +    if ( !already_pending )
> > > +        vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, true);
> > > +#endif
> > >   }
> > 
> > The issue with this is that it is potentially racy, against vcpuA
> > trapping into Xen and executing vcpu_update_evtchn_irq, that also reads
> > evtchn_upcall_pending, then calls vgic_inject_irq. The last
> > vgic_inject_irq executed could be the one passing level = false, losing
> > the notification.
> > 
> > I might have a better idea: what if we just vcpu_kick(v) and do nothing
> > else? There is no need to call vgic_inject_irq from here because the
> > other vcpu will take care of doing it for us after trapping into Xen
> > (vcpu_update_evtchn_irq). It also needs to trap into Xen anyway to be
> > able to receive the new event as soon as possible.
> > 
> > The code would become:
> > 
> >    if ( already_pending )
> >        return;
> > 
> >    vgic_inject_irq(v->domain, v, v->domain->arch.evtchn_irq, true);
> > 
> > #ifdef CONFIG_NEW_VGIC
> >    vcpu_kick(v);
> > #endif
> > 
> > 
> > Note that vgic_inject_irq already does a vcpu_kick but only after
> > passing the vgic_validate_injection check, which would fail in this case
> > because irq->line_level is not up-to-date.
> > 
> > What do you think?
> 
> At the moment, the issue you describe is a non-issue because we set the EOI
> bit in the LR (see vgic_v2_populate_lr). So there are no need to kick the
> other vCPU as it would enter to Xen as soon as the event channel interrupt is
> been eoied by the guest.
> 
> Therefore, I believe we don't need to have a different version of
> vcpu_mark_events_pending for the new vGIC.

Ah! It makes sense.

Andre, why did you choose to set the EOI bit in the LR for non-hardware
interrupts? If it came up in previous email exchanges, my apologies.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.