[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
On Fri, 16 Dec 2016, Stefano Stabellini wrote: > On Fri, 16 Dec 2016, Julien Grall wrote: > > Hi Stefano, > > > > On 16/12/16 00:08, Stefano Stabellini wrote: > > > On Thu, 15 Dec 2016, Julien Grall wrote: > > > > On 15/12/2016 01:04, Stefano Stabellini wrote: > > > > > The locking order is: first rank lock, then vgic lock. The order is > > > > > respected everywhere, except for gic_update_one_lr. > > > > > > > > > > gic_update_one_lr is called with the vgic lock held, but it calls > > > > > vgic_get_target_vcpu, which tries to obtain the rank lock. This can > > > > > cause deadlocks. > > > > > > > > > > We already have a version of vgic_get_target_vcpu that doesn't take > > > > > the > > > > > rank lock: __vgic_get_target_vcpu. Also the only routine that modify > > > > > the target vcpu are vgic_store_itargetsr and vgic_store_irouter. They > > > > > call vgic_migrate_irq, which already take the vgic lock. > > > > > > > > > > Solve the lock inversion problem, by not taking the rank lock in > > > > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of > > > > > vgic_get_target_vcpu). > > > > > > > > If I look at the callers of gic_update_one_lr, the function > > > > gic_clear_lrs > > > > will > > > > not take the rank lock. So from my understanding nobody will take the > > > > rank > > > > here. > > > > > > > > However __vgic_get_target_vcpu has an ASSERT to check whether the rank > > > > lock > > > > has been taken. So who is taking lock for gic_update_one_lr now? > > > > > > I should have removed the ASSERT - nobody is taking the rank lock now on > > > the gic_update_one_lr code path. > > > > Please either keep this ASSERT or update it. But not dropping it, we need to > > prevent anyone calling this function without any lock taken at least in > > debug > > build. > > > > The current callers (see vgic_{disable,enable}_irqs) are calling the > > function > > with rank lock taken. > > > > So we need to keep this behavior at least for them. > > > > > > We make this safe, by placing modifications to > > > > > rank->vcpu within regions protected by the vgic lock. > > > > > > > > This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu could > > > > be > > > > written/read by any vCPU. So you will never protect rank->vcpu with this > > > > lock. > > > > Did I miss anything? > > > > > > The vgic lock is per-vcpu, but it is always the vgic lock of the old (old > > > in the sense, before the interrupt is migrated) vcpu that is taken. > > > > > > On one hand any vcpu can read/write to itargetsr. Those operations are > > > protected by the rank lock. However vgic_migrate_irq and writes to > > > rank->vcpu are protected also by the vgic lock of the old vcpu (first > > > the rank lock is taken, then the vgic lock of the old vcpu). > > > > I don't think this is true. Any vCPU read/write to itargetsr will be > > protected > > by the vgic lock of the vCPU in rank->vcpu[offset]. > > > > For inflight IRQ, the migration will happen when the guest has EOIed the > > IRQ. > > Until this is happening, the guest may have written multiple time in > > ITARGETSR. > > > > For instance, let say the guest is requesting to migrate the IRQ from vCPU A > > to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr, if the > > interrupt is inflight vgic_migrate_irq will set a flag. Now the guest could > > decide to migrate the interrupt to vCPU C before the interrupt has been > > EOIed. > > In this case, vgic lock B will be taken. > > > > So we have a mismatch between the lock taken in gic_update_one_lr and > > vgic_migrate_irq. > > > > I think the only safe way to fully protect rank->vcpu is taking the rank > > lock. > > It will never change and be accessible from anywhere. > > When I submitted this series, I considered this scenario, and I thought > that the existing GIC_IRQ_GUEST_MIGRATING flag should be enough to > protect it, because when GIC_IRQ_GUEST_MIGRATING is set, > vgic_migrate_irq actually returns immediately, without doing anything. > Either GIC_IRQ_GUEST_MIGRATING is not set, and it's the first irq > migration, or GIC_IRQ_GUEST_MIGRATING is set, and vgic_migrate_irq just > returns. > > However, as I was writing this explanation to you in more details, I > realized that it is not actually safe. Concurrent accesses to > rank->vcpu, and concurrent calls to irq_set_affinity, can result in the > physical affinity actually set to the intermediate pcpu rather than the > final. As a matter of fact, one of these races affect the code today, > even without this patch series. > > I considered various possible ways to solve this problem, and the lock > inversion problem at once. These are the potential solutions I came up > with: > > 1) We release the vgic lock at the end of gic_update_one_lr, before > test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status). Then we take > the rank lock. We don't need the per-vcpu vgic lock to execute the > following code safely, the rank lock is enough: > > if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > { > struct vcpu *v_target = __vgic_get_target_vcpu(v, irq); > irq_set_affinity(p->desc, cpumask_of(v_target->processor)); > } > > This sounds simple in theory, but in practice it is very ugly. We would > end up with a function (gic_update_one_lr) that is called with the vgic > lock but actually releases the lock before returning. > > 2) We run gic_update_one_lr and vgic_store_itargetsr in parallel safely > and locklessly. There might be a way to do it, but it is not easy I > haven't found it yet. > > 3) Use the vgic lock on the old vcpu to protect read accesses to > rank->vcpu. This is the same strategy as implemented in this series. > However, to actually work correctly, we also have to store the original > vcpu id in struct pending_irq. That way, we know for sure which vgic > lock we need to take. > > > Let me premise that I don't like any of these approaches very much. But > I think that 1) is the least bad. If that's OK for you, I'll do that. Sorry I meant that 3) is the least bad :-) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |