[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug



On Fri, 16 Dec 2016, Stefano Stabellini wrote:
> On Fri, 16 Dec 2016, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 16/12/16 00:08, Stefano Stabellini wrote:
> > > On Thu, 15 Dec 2016, Julien Grall wrote:
> > > > On 15/12/2016 01:04, Stefano Stabellini wrote:
> > > > > The locking order is: first rank lock, then vgic lock. The order is
> > > > > respected everywhere, except for gic_update_one_lr.
> > > > > 
> > > > > gic_update_one_lr is called with the vgic lock held, but it calls
> > > > > vgic_get_target_vcpu, which tries to obtain the rank lock. This can
> > > > > cause deadlocks.
> > > > > 
> > > > > We already have a version of vgic_get_target_vcpu that doesn't take 
> > > > > the
> > > > > rank lock: __vgic_get_target_vcpu. Also the only routine that modify
> > > > > the target vcpu are vgic_store_itargetsr and vgic_store_irouter. They
> > > > > call vgic_migrate_irq, which already take the vgic lock.
> > > > > 
> > > > > Solve the lock inversion problem, by not taking the rank lock in
> > > > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of
> > > > > vgic_get_target_vcpu).
> > > > 
> > > > If I look at the callers of gic_update_one_lr, the function 
> > > > gic_clear_lrs
> > > > will
> > > > not take the rank lock. So from my understanding nobody will take the 
> > > > rank
> > > > here.
> > > > 
> > > > However __vgic_get_target_vcpu has an ASSERT to check whether the rank
> > > > lock
> > > > has been taken. So who is taking lock for gic_update_one_lr now?
> > > 
> > > I should have removed the ASSERT - nobody is taking the rank lock now on
> > > the gic_update_one_lr code path.
> > 
> > Please either keep this ASSERT or update it. But not dropping it, we need to
> > prevent anyone calling this function without any lock taken at least in 
> > debug
> > build.
> > 
> > The current callers (see vgic_{disable,enable}_irqs) are calling the 
> > function
> > with rank lock taken.
> > 
> > So we need to keep this behavior at least for them.
> > 
> > > >  We make this safe, by placing modifications to
> > > > > rank->vcpu within regions protected by the vgic lock.
> > > > 
> > > > This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu could 
> > > > be
> > > > written/read by any vCPU. So you will never protect rank->vcpu with this
> > > > lock.
> > > > Did I miss anything?
> > > 
> > > The vgic lock is per-vcpu, but it is always the vgic lock of the old (old
> > > in the sense, before the interrupt is migrated) vcpu that is taken.
> > > 
> > > On one hand any vcpu can read/write to itargetsr. Those operations are
> > > protected by the rank lock. However vgic_migrate_irq and writes to
> > > rank->vcpu are protected also by the vgic lock of the old vcpu (first
> > > the rank lock is taken, then the vgic lock of the old vcpu).
> > 
> > I don't think this is true. Any vCPU read/write to itargetsr will be 
> > protected
> > by the vgic lock of the vCPU in rank->vcpu[offset].
> > 
> > For inflight IRQ, the migration will happen when the guest has EOIed the 
> > IRQ.
> > Until this is happening, the guest may have written multiple time in
> > ITARGETSR.
> > 
> > For instance, let say the guest is requesting to migrate the IRQ from vCPU A
> > to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr, if the
> > interrupt is inflight vgic_migrate_irq will set a flag. Now the guest could
> > decide to migrate the interrupt to vCPU C before the interrupt has been 
> > EOIed.
> > In this case, vgic lock B will be taken.
> > 
> > So we have a mismatch between the lock taken in gic_update_one_lr and
> > vgic_migrate_irq.
> > 
> > I think the only safe way to fully protect rank->vcpu is taking the rank 
> > lock.
> > It will never change and be accessible from anywhere.
> 
> When I submitted this series, I considered this scenario, and I thought
> that the existing GIC_IRQ_GUEST_MIGRATING flag should be enough to
> protect it, because when GIC_IRQ_GUEST_MIGRATING is set,
> vgic_migrate_irq actually returns immediately, without doing anything.
> Either GIC_IRQ_GUEST_MIGRATING is not set, and it's the first irq
> migration, or GIC_IRQ_GUEST_MIGRATING is set, and vgic_migrate_irq just
> returns.
> 
> However, as I was writing this explanation to you in more details, I
> realized that it is not actually safe. Concurrent accesses to
> rank->vcpu, and concurrent calls to irq_set_affinity, can result in the
> physical affinity actually set to the intermediate pcpu rather than the
> final. As a matter of fact, one of these races affect the code today,
> even without this patch series.
> 
> I considered various possible ways to solve this problem, and the lock
> inversion problem at once. These are the potential solutions I came up
> with:
> 
> 1) We release the vgic lock at the end of gic_update_one_lr, before
> test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status). Then we take
> the rank lock. We don't need the per-vcpu vgic lock to execute the
> following code safely, the rank lock is enough:
> 
>             if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
>             {
>                 struct vcpu *v_target = __vgic_get_target_vcpu(v, irq);
>                 irq_set_affinity(p->desc, cpumask_of(v_target->processor));
>             }
> 
> This sounds simple in theory, but in practice it is very ugly. We would
> end up with a function (gic_update_one_lr) that is called with the vgic
> lock but actually releases the lock before returning.
> 
> 2) We run gic_update_one_lr and vgic_store_itargetsr in parallel safely
> and locklessly. There might be a way to do it, but it is not easy I
> haven't found it yet.
> 
> 3) Use the vgic lock on the old vcpu to protect read accesses to
> rank->vcpu. This is the same strategy as implemented in this series.
> However, to actually work correctly, we also have to store the original
> vcpu id in struct pending_irq. That way, we know for sure which vgic
> lock we need to take.
> 
> 
> Let me premise that I don't like any of these approaches very much. But
> I think that 1) is the least bad. If that's OK for you, I'll do that.

Sorry I meant that 3) is the least bad :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.