[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/6] xen/arm: track the state of guest IRQs



On Wed, 11 Dec 2013, Ian Campbell wrote:
> On Tue, 2013-12-10 at 18:50 +0000, Stefano Stabellini wrote:
> > Introduce a status field in struct pending_irq. Valid states are
> > GUEST_PENDING, GUEST_VISIBLE and GUEST_ENABLED and they are not mutually
> > exclusive.  See the in-code comment for an explanation of the states and
> > how they are used.
> > Use atomic operations to set and clear the status bits. Note that
> > setting GIC_IRQ_GUEST_VISIBLE and clearing GIC_IRQ_GUEST_PENDING can be
> > done in two separate operations as the underlying pending status is
> > actually only cleared on the LR after the guest ACKs the interrupts.
> > Until that happens it's not possible to receive another interrupt.
> 
> Is the ordering of the set/clear important? It seems like it ought to
> be, one of the orderings risks losing information and the other risks
> spurious interrupts (the latter being obviously preferred).

The only case for which it might matter is physical edge-triggered IRQs
(I have none on the platform I am testing this series on).
If we clear GUEST_PENDING before setting GUEST_VISIBLE we slightly
reduce the chance of loosing an interrupt. But keep in mind that the
chance still exists: after we read GICC_IAR and before we clear
GUEST_PENDING, if we receive another hardware interrupt it would be
lost.
So I don't think that the relative order of GUEST_PENDING and
GUEST_VISIBLE makes any meaningful difference.


> > One exception is evtchn_irq: in that case we don't want to
> > set the GIC_IRQ_GUEST_PENDING bit if it is already GUEST_VISIBLE,
> > because as part of the event handling loop, the guest would realize that
> > new events are present even without a new notification.
> 
> Is the race between checking the event bit mask and unmasking and/or
> acking the interrupt closed somehow?
 
The window when the guest can loose evtchn notifications happens
after existing the event handling loop and before EOIing the interrupt.
We fixed it by noticing that evtchn_upcall_pending is set and trying an
evtchn_irq re-injection right before returning to guest.

Of course we don't want to re-inject the evtchn_irq twice if it is
already there so that's why I added the GIC_IRQ_GUEST_VISIBLE check in
vgic_vcpu_inject_irq. The behavior in this regard should be exactly the
same as before.

We don't want to rely on GIC_IRQ_GUEST_PENDING to inject a second
evtchn_irq because if the guest is in the event handling loop it doesn't
actually need a new notification.


> (We had an issue like this with PVHVM on x86 didn't we?)

It is different because in that case the source of notifications (the
vector callback) doesn't need an EOI.


> > Also we already have a way to figure out exactly when we do need to
> > inject a second notification if vgic_vcpu_inject_irq is called after the
> > end of the guest event handling loop and before the guest EOIs the
> > interrupt (see db453468d92369e7182663fb13e14d83ec4ce456 "arm: vgic: fix
> > race between evtchn upcall and evtchnop_send").
> > 
> > In maintenance_interrupt only stops injecting interrupts if no new
> 
> stops => stop.
> 
> > interrupts have been added to the LRs.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> > 
> > Changes in v2:
> > - use atomic operations to modify the pending_irq status bits, remove
> > the now unnecessary locks;
> > - make status unsigned long;
> > - in maintenance_interrupt only stops injecting interrupts if no new
> > interrupts have been added to the LRs;
> > - add a state to keep track whether the guest irq is enabled at the
> > vgicd level;
> > 
> > Changes in v3:
> > - do not set the GUEST_PENDING bit for evtchn_irq if the irq is already
> > guest visible.
> > ---
> >  xen/arch/arm/gic.c           |   52 
> > +++++++++++++++++++++++++++++++-----------
> >  xen/arch/arm/vgic.c          |   17 +++++++++++---
> >  xen/include/asm-arm/domain.h |   40 ++++++++++++++++++++++++++++++++
> >  3 files changed, 93 insertions(+), 16 deletions(-)
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 5f7a548..d736a30 100644
> > --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> > @@ -635,17 +635,20 @@ void gic_set_guest_irq(struct vcpu *v, unsigned int 
> > virtual_irq,
> >  
> >      spin_lock_irqsave(&gic.lock, flags);
> >  
> > +    n = irq_to_pending(v, virtual_irq);
> > +
> >      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
> >      {
> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> >          if (i < nr_lrs) {
> >              set_bit(i, &this_cpu(lr_mask));
> >              gic_set_lr(i, virtual_irq, state, priority);
> > +            set_bit(_GIC_IRQ_GUEST_VISIBLE, &n->status);
> > +            clear_bit(_GIC_IRQ_GUEST_PENDING, &n->status);
> 
> This gic_set_lr+set lr_mask+set VISIBLE+clear PENDING happens a lot and
> always together (apart from the maint interrupt which skips setting
> visible, but that would be harmless there) -- why not push the bit
> manipulations down into gic_set_lr?

OK


> > @@ -884,10 +889,11 @@ static void maintenance_interrupt(int irq, void 
> > *dev_id, struct cpu_user_regs *r
> >      uint32_t lr;
> >      struct vcpu *v = current;
> >      uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 
> > 32);
> > +    unsigned int set_int = 0;
> 
> This doesn't need to be unsigned and doesn't need to actually count the
> number, it's just a boolean, treating it as such avoids
> read/modify/write cycles.

OK


> >  
> >      while ((i = find_next_bit((const long unsigned int *) &eisr,
> >                                64, i)) < 64) {
> > -        struct pending_irq *p;
> > +        struct pending_irq *p, *p2;
> >          int cpu;
> >  
> >          cpu = -1;
> > @@ -898,17 +904,6 @@ static void maintenance_interrupt(int irq, void 
> > *dev_id, struct cpu_user_regs *r
> >          GICH[GICH_LR + i] = 0;
> >          clear_bit(i, &this_cpu(lr_mask));
> >  
> > -        if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> > -            p = list_entry(v->arch.vgic.lr_pending.next, typeof(*p), 
> > lr_queue);
> > -            gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> > -            list_del_init(&p->lr_queue);
> > -            set_bit(i, &this_cpu(lr_mask));
> > -        } else {
> > -            gic_inject_irq_stop();
> > -        }
> > -        spin_unlock_irq(&gic.lock);
> > -
> > -        spin_lock_irq(&v->arch.vgic.lock);
> >          p = irq_to_pending(v, virq);
> >          if ( p->desc != NULL ) {
> >              p->desc->status &= ~IRQ_INPROGRESS;
> > @@ -916,6 +911,31 @@ static void maintenance_interrupt(int irq, void 
> > *dev_id, struct cpu_user_regs *r
> >              cpu = p->desc->arch.eoi_cpu;
> >              pirq = p->desc->irq;
> >          }
> > +        if ( test_bit(_GIC_IRQ_GUEST_PENDING, &p->status) &&
> > +             test_bit(_GIC_IRQ_GUEST_ENABLED, &p->status))
> > +        {
> > +            gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> > +            clear_bit(_GIC_IRQ_GUEST_PENDING, &p->status);
> > +            i++;
> > +            spin_unlock_irq(&gic.lock);
> > +            set_int++;
> > +            continue;
> > +        }
> > +
> > +        clear_bit(_GIC_IRQ_GUEST_VISIBLE, &p->status);
> > +
> > +        if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> > +            p2 = list_entry(v->arch.vgic.lr_pending.next, typeof(*p2), 
> > lr_queue);
> > +            gic_set_lr(i, p2->irq, GICH_LR_PENDING, p2->priority);
> > +            set_bit(_GIC_IRQ_GUEST_VISIBLE, &p2->status);
> > +            clear_bit(_GIC_IRQ_GUEST_PENDING, &p2->status);
> > +            list_del_init(&p2->lr_queue);
> > +            set_bit(i, &this_cpu(lr_mask));
> > +            set_int++;
> > +        }
> > +        spin_unlock_irq(&gic.lock);
> 
> This ordering means that a low priority but frequently occurring
> interrupt can starve out a high priority one, because it will keep
> jumping the queue.
> 
> Our priority handling is pretty rudimentary right now, but we should at
> least consciously decide we are happy with this.
> 
> The solution would be to put the interrupt back on the lr_pending queue
> in its proper order and rely on it getting reinjected that way, I think?

You are right, I'll do that.


> > +
> > +        spin_lock_irq(&v->arch.vgic.lock);
> >          list_del_init(&p->inflight);
> >          spin_unlock_irq(&v->arch.vgic.lock);
> >  
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 2e4b11f..a16bdf1 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -369,6 +369,7 @@ static void vgic_enable_irqs(struct vcpu *v, uint32_t 
> > r, int n)
> >      while ( (i = find_next_bit((const long unsigned int *) &r, 32, i)) < 
> > 32 ) {
> >          irq = i + (32 * n);
> >          p = irq_to_pending(v, irq);
> > +        set_bit(_GIC_IRQ_GUEST_ENABLED, &p->status);
> >          if ( !list_empty(&p->inflight) )
> >              gic_set_guest_irq(v, irq, GICH_LR_PENDING, p->priority);
> >          i++;
> > @@ -672,8 +673,17 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> > irq, int virtual)
> >  
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >  
> > -    /* vcpu offline or irq already pending */
> > -    if (test_bit(_VPF_down, &v->pause_flags) || !list_empty(&n->inflight))
> > +    if ( !list_empty(&n->inflight) )
> > +    {
> > +        if ( (irq != current->domain->arch.evtchn_irq) ||
> > +             (!test_bit(_GIC_IRQ_GUEST_VISIBLE, &n->status)) )
> > +            set_bit(_GIC_IRQ_GUEST_PENDING, &n->status);
> 
> I worry about a race here. I think the consequence is a spurious
> interrupt i.e. the safe option. Right? If yes then a comment to that
> affect would be useful.

There is no race: it would only be possible for evtchn injections but
those are completely taken care of by the in-guest evtchn loop and
db453468d92369e7182663fb13e14d83ec4ce456.

Or maybe you mean something else that I am not seeing?


> > +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> > +        return;
> > +    }
> > +
> > +    /* vcpu offline */
> > +    if ( test_bit(_VPF_down, &v->pause_flags) )
> >      {
> >          spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> >          return;
> > @@ -682,6 +692,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> > irq, int virtual)
> >      priority = byte_read(rank->ipriority[REG_RANK_INDEX(8, idx)], 0, byte);
> >  
> >      n->irq = irq;
> > +    set_bit(_GIC_IRQ_GUEST_PENDING, &n->status);
> >      n->priority = priority;
> >      if (!virtual)
> >          n->desc = irq_to_desc(irq);
> > @@ -689,7 +700,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> > irq, int virtual)
> >          n->desc = NULL;
> >  
> >      /* the irq is enabled */
> > -    if ( rank->ienable & (1 << (irq % 32)) )
> > +    if ( test_bit(_GIC_IRQ_GUEST_ENABLED, &n->status) )
> >          gic_set_guest_irq(v, irq, GICH_LR_PENDING, priority);
> >  
> >      list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
> > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> > index 8ebee3e..da34337 100644
> > --- a/xen/include/asm-arm/domain.h
> > +++ b/xen/include/asm-arm/domain.h
> > @@ -22,6 +22,46 @@ struct vgic_irq_rank {
> >  struct pending_irq
> >  {
> >      int irq;
> > +    /*
> > +     * The following two states track the lifecycle of the guest irq.
> > +     * However because we are not sure and we don't want to track
> > +     * whether an irq added to an LR register is PENDING or ACTIVE, the
> > +     * following states are just an approximation.
> > +     *
> > +     * GIC_IRQ_GUEST_PENDING: the irq is asserted
> > +     *
> > +     * GIC_IRQ_GUEST_VISIBLE: the irq has been added to an LR register,
> > +     * therefore the guest is aware of it. From the guest point of view
> > +     * the irq can be pending (if the guest has not acked the irq yet)
> > +     * or active (after acking the irq).
> > +     *
> > +     * In order for the state machine to be fully accurate, for level
> > +     * interrupts, we should keep the GIC_IRQ_GUEST_PENDING state until
> > +     * the guest deactivates the irq. However because we are not sure
> > +     * when that happens, we simply remove the GIC_IRQ_GUEST_PENDING
> > +     * state when we add the irq to an LR register. We add it back when
> > +     * we receive another interrupt notification.
> > +     * Therefore it is possible to set GIC_IRQ_GUEST_PENDING while the
> > +     * irq is GIC_IRQ_GUEST_VISIBLE. We could also change the state of
> > +     * the guest irq in the LR register from active to active and
> > +     * pending, but for simplicity we simply inject a second irq after
> > +     * the guest EOIs the first one.
> > +     *
> > +     *
> > +     * An additional state is used to keep track of whether the guest
> > +     * irq is enabled at the vgicd level:
> > +     *
> > +     * GIC_IRQ_GUEST_ENABLED: the guest IRQ is enabled at the VGICD
> > +     * level (GICD_ICENABLER/GICD_ISENABLER).
> > +     *
> > +     */
> > +#define _GIC_IRQ_GUEST_PENDING  0
> > +#define GIC_IRQ_GUEST_PENDING   (1UL<<0)
> > +#define _GIC_IRQ_GUEST_VISIBLE  1
> > +#define GIC_IRQ_GUEST_VISIBLE   (1UL<<1)
> > +#define _GIC_IRQ_GUEST_ENABLED  2
> > +#define GIC_IRQ_GUEST_ENABLED   (1UL<<2)
> 
> You don't need the raw values now, only the bit offsets, I think you can
> drop GIC_IRQ* and rename _GIC_IRQ* without the underscore. Doing this
> will help prevent people thinking they can manipulate the status field
> without bitops.

OK

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.