[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug



On Fri, 16 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 16/12/16 00:08, Stefano Stabellini wrote:
> > On Thu, 15 Dec 2016, Julien Grall wrote:
> > > On 15/12/2016 01:04, Stefano Stabellini wrote:
> > > > The locking order is: first rank lock, then vgic lock. The order is
> > > > respected everywhere, except for gic_update_one_lr.
> > > > 
> > > > gic_update_one_lr is called with the vgic lock held, but it calls
> > > > vgic_get_target_vcpu, which tries to obtain the rank lock. This can
> > > > cause deadlocks.
> > > > 
> > > > We already have a version of vgic_get_target_vcpu that doesn't take the
> > > > rank lock: __vgic_get_target_vcpu. Also the only routine that modify
> > > > the target vcpu are vgic_store_itargetsr and vgic_store_irouter. They
> > > > call vgic_migrate_irq, which already take the vgic lock.
> > > > 
> > > > Solve the lock inversion problem, by not taking the rank lock in
> > > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of
> > > > vgic_get_target_vcpu).
> > > 
> > > If I look at the callers of gic_update_one_lr, the function gic_clear_lrs
> > > will
> > > not take the rank lock. So from my understanding nobody will take the rank
> > > here.
> > > 
> > > However __vgic_get_target_vcpu has an ASSERT to check whether the rank
> > > lock
> > > has been taken. So who is taking lock for gic_update_one_lr now?
> > 
> > I should have removed the ASSERT - nobody is taking the rank lock now on
> > the gic_update_one_lr code path.
> 
> Please either keep this ASSERT or update it. But not dropping it, we need to
> prevent anyone calling this function without any lock taken at least in debug
> build.
> 
> The current callers (see vgic_{disable,enable}_irqs) are calling the function
> with rank lock taken.
> 
> So we need to keep this behavior at least for them.
> 
> > >  We make this safe, by placing modifications to
> > > > rank->vcpu within regions protected by the vgic lock.
> > > 
> > > This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu could be
> > > written/read by any vCPU. So you will never protect rank->vcpu with this
> > > lock.
> > > Did I miss anything?
> > 
> > The vgic lock is per-vcpu, but it is always the vgic lock of the old (old
> > in the sense, before the interrupt is migrated) vcpu that is taken.
> > 
> > On one hand any vcpu can read/write to itargetsr. Those operations are
> > protected by the rank lock. However vgic_migrate_irq and writes to
> > rank->vcpu are protected also by the vgic lock of the old vcpu (first
> > the rank lock is taken, then the vgic lock of the old vcpu).
> 
> I don't think this is true. Any vCPU read/write to itargetsr will be protected
> by the vgic lock of the vCPU in rank->vcpu[offset].
> 
> For inflight IRQ, the migration will happen when the guest has EOIed the IRQ.
> Until this is happening, the guest may have written multiple time in
> ITARGETSR.
> 
> For instance, let say the guest is requesting to migrate the IRQ from vCPU A
> to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr, if the
> interrupt is inflight vgic_migrate_irq will set a flag. Now the guest could
> decide to migrate the interrupt to vCPU C before the interrupt has been EOIed.
> In this case, vgic lock B will be taken.
> 
> So we have a mismatch between the lock taken in gic_update_one_lr and
> vgic_migrate_irq.
> 
> I think the only safe way to fully protect rank->vcpu is taking the rank lock.
> It will never change and be accessible from anywhere.

When I submitted this series, I considered this scenario, and I thought
that the existing GIC_IRQ_GUEST_MIGRATING flag should be enough to
protect it, because when GIC_IRQ_GUEST_MIGRATING is set,
vgic_migrate_irq actually returns immediately, without doing anything.
Either GIC_IRQ_GUEST_MIGRATING is not set, and it's the first irq
migration, or GIC_IRQ_GUEST_MIGRATING is set, and vgic_migrate_irq just
returns.

However, as I was writing this explanation to you in more details, I
realized that it is not actually safe. Concurrent accesses to
rank->vcpu, and concurrent calls to irq_set_affinity, can result in the
physical affinity actually set to the intermediate pcpu rather than the
final. As a matter of fact, one of these races affect the code today,
even without this patch series.

I considered various possible ways to solve this problem, and the lock
inversion problem at once. These are the potential solutions I came up
with:

1) We release the vgic lock at the end of gic_update_one_lr, before
test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status). Then we take
the rank lock. We don't need the per-vcpu vgic lock to execute the
following code safely, the rank lock is enough:

            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
            {
                struct vcpu *v_target = __vgic_get_target_vcpu(v, irq);
                irq_set_affinity(p->desc, cpumask_of(v_target->processor));
            }

This sounds simple in theory, but in practice it is very ugly. We would
end up with a function (gic_update_one_lr) that is called with the vgic
lock but actually releases the lock before returning.

2) We run gic_update_one_lr and vgic_store_itargetsr in parallel safely
and locklessly. There might be a way to do it, but it is not easy I
haven't found it yet.

3) Use the vgic lock on the old vcpu to protect read accesses to
rank->vcpu. This is the same strategy as implemented in this series.
However, to actually work correctly, we also have to store the original
vcpu id in struct pending_irq. That way, we know for sure which vgic
lock we need to take.


Let me premise that I don't like any of these approaches very much. But
I think that 1) is the least bad. If that's OK for you, I'll do that.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.