[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/6] xen/arm: track the state of guest IRQs



On Tue, 2013-12-10 at 18:50 +0000, Stefano Stabellini wrote:
> Introduce a status field in struct pending_irq. Valid states are
> GUEST_PENDING, GUEST_VISIBLE and GUEST_ENABLED and they are not mutually
> exclusive.  See the in-code comment for an explanation of the states and
> how they are used.
> Use atomic operations to set and clear the status bits. Note that
> setting GIC_IRQ_GUEST_VISIBLE and clearing GIC_IRQ_GUEST_PENDING can be
> done in two separate operations as the underlying pending status is
> actually only cleared on the LR after the guest ACKs the interrupts.
> Until that happens it's not possible to receive another interrupt.

Is the ordering of the set/clear important? It seems like it ought to
be, one of the orderings risks losing information and the other risks
spurious interrupts (the latter being obviously preferred).

> One exception is evtchn_irq: in that case we don't want to
> set the GIC_IRQ_GUEST_PENDING bit if it is already GUEST_VISIBLE,
> because as part of the event handling loop, the guest would realize that
> new events are present even without a new notification.

Is the race between checking the event bit mask and unmasking and/or
acking the interrupt closed somehow?

(We had an issue like this with PVHVM on x86 didn't we?)

> Also we already have a way to figure out exactly when we do need to
> inject a second notification if vgic_vcpu_inject_irq is called after the
> end of the guest event handling loop and before the guest EOIs the
> interrupt (see db453468d92369e7182663fb13e14d83ec4ce456 "arm: vgic: fix
> race between evtchn upcall and evtchnop_send").
> 
> In maintenance_interrupt only stops injecting interrupts if no new

stops => stop.

> interrupts have been added to the LRs.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> 
> Changes in v2:
> - use atomic operations to modify the pending_irq status bits, remove
> the now unnecessary locks;
> - make status unsigned long;
> - in maintenance_interrupt only stops injecting interrupts if no new
> interrupts have been added to the LRs;
> - add a state to keep track whether the guest irq is enabled at the
> vgicd level;
> 
> Changes in v3:
> - do not set the GUEST_PENDING bit for evtchn_irq if the irq is already
> guest visible.
> ---
>  xen/arch/arm/gic.c           |   52 
> +++++++++++++++++++++++++++++++-----------
>  xen/arch/arm/vgic.c          |   17 +++++++++++---
>  xen/include/asm-arm/domain.h |   40 ++++++++++++++++++++++++++++++++
>  3 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 5f7a548..d736a30 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -635,17 +635,20 @@ void gic_set_guest_irq(struct vcpu *v, unsigned int 
> virtual_irq,
>  
>      spin_lock_irqsave(&gic.lock, flags);
>  
> +    n = irq_to_pending(v, virtual_irq);
> +
>      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
>      {
>          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
>          if (i < nr_lrs) {
>              set_bit(i, &this_cpu(lr_mask));
>              gic_set_lr(i, virtual_irq, state, priority);
> +            set_bit(_GIC_IRQ_GUEST_VISIBLE, &n->status);
> +            clear_bit(_GIC_IRQ_GUEST_PENDING, &n->status);

This gic_set_lr+set lr_mask+set VISIBLE+clear PENDING happens a lot and
always together (apart from the maint interrupt which skips setting
visible, but that would be harmless there) -- why not push the bit
manipulations down into gic_set_lr?

> @@ -884,10 +889,11 @@ static void maintenance_interrupt(int irq, void 
> *dev_id, struct cpu_user_regs *r
>      uint32_t lr;
>      struct vcpu *v = current;
>      uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
> +    unsigned int set_int = 0;

This doesn't need to be unsigned and doesn't need to actually count the
number, it's just a boolean, treating it as such avoids
read/modify/write cycles.

>  
>      while ((i = find_next_bit((const long unsigned int *) &eisr,
>                                64, i)) < 64) {
> -        struct pending_irq *p;
> +        struct pending_irq *p, *p2;
>          int cpu;
>  
>          cpu = -1;
> @@ -898,17 +904,6 @@ static void maintenance_interrupt(int irq, void *dev_id, 
> struct cpu_user_regs *r
>          GICH[GICH_LR + i] = 0;
>          clear_bit(i, &this_cpu(lr_mask));
>  
> -        if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> -            p = list_entry(v->arch.vgic.lr_pending.next, typeof(*p), 
> lr_queue);
> -            gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> -            list_del_init(&p->lr_queue);
> -            set_bit(i, &this_cpu(lr_mask));
> -        } else {
> -            gic_inject_irq_stop();
> -        }
> -        spin_unlock_irq(&gic.lock);
> -
> -        spin_lock_irq(&v->arch.vgic.lock);
>          p = irq_to_pending(v, virq);
>          if ( p->desc != NULL ) {
>              p->desc->status &= ~IRQ_INPROGRESS;
> @@ -916,6 +911,31 @@ static void maintenance_interrupt(int irq, void *dev_id, 
> struct cpu_user_regs *r
>              cpu = p->desc->arch.eoi_cpu;
>              pirq = p->desc->irq;
>          }
> +        if ( test_bit(_GIC_IRQ_GUEST_PENDING, &p->status) &&
> +             test_bit(_GIC_IRQ_GUEST_ENABLED, &p->status))
> +        {
> +            gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> +            clear_bit(_GIC_IRQ_GUEST_PENDING, &p->status);
> +            i++;
> +            spin_unlock_irq(&gic.lock);
> +            set_int++;
> +            continue;
> +        }
> +
> +        clear_bit(_GIC_IRQ_GUEST_VISIBLE, &p->status);
> +
> +        if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> +            p2 = list_entry(v->arch.vgic.lr_pending.next, typeof(*p2), 
> lr_queue);
> +            gic_set_lr(i, p2->irq, GICH_LR_PENDING, p2->priority);
> +            set_bit(_GIC_IRQ_GUEST_VISIBLE, &p2->status);
> +            clear_bit(_GIC_IRQ_GUEST_PENDING, &p2->status);
> +            list_del_init(&p2->lr_queue);
> +            set_bit(i, &this_cpu(lr_mask));
> +            set_int++;
> +        }
> +        spin_unlock_irq(&gic.lock);

This ordering means that a low priority but frequently occurring
interrupt can starve out a high priority one, because it will keep
jumping the queue.

Our priority handling is pretty rudimentary right now, but we should at
least consciously decide we are happy with this.

The solution would be to put the interrupt back on the lr_pending queue
in its proper order and rely on it getting reinjected that way, I think?

> +
> +        spin_lock_irq(&v->arch.vgic.lock);
>          list_del_init(&p->inflight);
>          spin_unlock_irq(&v->arch.vgic.lock);
>  
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 2e4b11f..a16bdf1 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -369,6 +369,7 @@ static void vgic_enable_irqs(struct vcpu *v, uint32_t r, 
> int n)
>      while ( (i = find_next_bit((const long unsigned int *) &r, 32, i)) < 32 
> ) {
>          irq = i + (32 * n);
>          p = irq_to_pending(v, irq);
> +        set_bit(_GIC_IRQ_GUEST_ENABLED, &p->status);
>          if ( !list_empty(&p->inflight) )
>              gic_set_guest_irq(v, irq, GICH_LR_PENDING, p->priority);
>          i++;
> @@ -672,8 +673,17 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> irq, int virtual)
>  
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>  
> -    /* vcpu offline or irq already pending */
> -    if (test_bit(_VPF_down, &v->pause_flags) || !list_empty(&n->inflight))
> +    if ( !list_empty(&n->inflight) )
> +    {
> +        if ( (irq != current->domain->arch.evtchn_irq) ||
> +             (!test_bit(_GIC_IRQ_GUEST_VISIBLE, &n->status)) )
> +            set_bit(_GIC_IRQ_GUEST_PENDING, &n->status);

I worry about a race here. I think the consequence is a spurious
interrupt i.e. the safe option. Right? If yes then a comment to that
affect would be useful.

> +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> +        return;
> +    }
> +
> +    /* vcpu offline */
> +    if ( test_bit(_VPF_down, &v->pause_flags) )
>      {
>          spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
>          return;
> @@ -682,6 +692,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> irq, int virtual)
>      priority = byte_read(rank->ipriority[REG_RANK_INDEX(8, idx)], 0, byte);
>  
>      n->irq = irq;
> +    set_bit(_GIC_IRQ_GUEST_PENDING, &n->status);
>      n->priority = priority;
>      if (!virtual)
>          n->desc = irq_to_desc(irq);
> @@ -689,7 +700,7 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
> irq, int virtual)
>          n->desc = NULL;
>  
>      /* the irq is enabled */
> -    if ( rank->ienable & (1 << (irq % 32)) )
> +    if ( test_bit(_GIC_IRQ_GUEST_ENABLED, &n->status) )
>          gic_set_guest_irq(v, irq, GICH_LR_PENDING, priority);
>  
>      list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 8ebee3e..da34337 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -22,6 +22,46 @@ struct vgic_irq_rank {
>  struct pending_irq
>  {
>      int irq;
> +    /*
> +     * The following two states track the lifecycle of the guest irq.
> +     * However because we are not sure and we don't want to track
> +     * whether an irq added to an LR register is PENDING or ACTIVE, the
> +     * following states are just an approximation.
> +     *
> +     * GIC_IRQ_GUEST_PENDING: the irq is asserted
> +     *
> +     * GIC_IRQ_GUEST_VISIBLE: the irq has been added to an LR register,
> +     * therefore the guest is aware of it. From the guest point of view
> +     * the irq can be pending (if the guest has not acked the irq yet)
> +     * or active (after acking the irq).
> +     *
> +     * In order for the state machine to be fully accurate, for level
> +     * interrupts, we should keep the GIC_IRQ_GUEST_PENDING state until
> +     * the guest deactivates the irq. However because we are not sure
> +     * when that happens, we simply remove the GIC_IRQ_GUEST_PENDING
> +     * state when we add the irq to an LR register. We add it back when
> +     * we receive another interrupt notification.
> +     * Therefore it is possible to set GIC_IRQ_GUEST_PENDING while the
> +     * irq is GIC_IRQ_GUEST_VISIBLE. We could also change the state of
> +     * the guest irq in the LR register from active to active and
> +     * pending, but for simplicity we simply inject a second irq after
> +     * the guest EOIs the first one.
> +     *
> +     *
> +     * An additional state is used to keep track of whether the guest
> +     * irq is enabled at the vgicd level:
> +     *
> +     * GIC_IRQ_GUEST_ENABLED: the guest IRQ is enabled at the VGICD
> +     * level (GICD_ICENABLER/GICD_ISENABLER).
> +     *
> +     */
> +#define _GIC_IRQ_GUEST_PENDING  0
> +#define GIC_IRQ_GUEST_PENDING   (1UL<<0)
> +#define _GIC_IRQ_GUEST_VISIBLE  1
> +#define GIC_IRQ_GUEST_VISIBLE   (1UL<<1)
> +#define _GIC_IRQ_GUEST_ENABLED  2
> +#define GIC_IRQ_GUEST_ENABLED   (1UL<<2)

You don't need the raw values now, only the bit offsets, I think you can
drop GIC_IRQ* and rename _GIC_IRQ* without the underscore. Doing this
will help prevent people thinking they can manipulate the status field
without bitops.

> +    unsigned long status;
>      struct irq_desc *desc; /* only set it the irq corresponds to a physical 
> irq */
>      uint8_t priority;
>      /* inflight is used to append instances of pending_irq to



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.