[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
From: Julien Grall <julien.grall@xxxxxxx>
Date: Fri, 16 Dec 2016 11:51:00 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 16 Dec 2016 11:51:11 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi Stefano,

On 16/12/16 00:08, Stefano Stabellini wrote:

On Thu, 15 Dec 2016, Julien Grall wrote:

On 15/12/2016 01:04, Stefano Stabellini wrote:

The locking order is: first rank lock, then vgic lock. The order is
respected everywhere, except for gic_update_one_lr.

gic_update_one_lr is called with the vgic lock held, but it calls
vgic_get_target_vcpu, which tries to obtain the rank lock. This can
cause deadlocks.

We already have a version of vgic_get_target_vcpu that doesn't take the
rank lock: __vgic_get_target_vcpu. Also the only routine that modify
the target vcpu are vgic_store_itargetsr and vgic_store_irouter. They
call vgic_migrate_irq, which already take the vgic lock.

Solve the lock inversion problem, by not taking the rank lock in
gic_update_one_lr (calling __vgic_get_target_vcpu instead of
vgic_get_target_vcpu).


If I look at the callers of gic_update_one_lr, the function gic_clear_lrs will
not take the rank lock. So from my understanding nobody will take the rank
here.

However __vgic_get_target_vcpu has an ASSERT to check whether the rank lock
has been taken. So who is taking lock for gic_update_one_lr now?


I should have removed the ASSERT - nobody is taking the rank lock now on
the gic_update_one_lr code path.

Please either keep this ASSERT or update it. But not dropping it, weneed to prevent anyone calling this function without any lock taken atleast in debug build.

The current callers (see vgic_{disable,enable}_irqs) are calling thefunction with rank lock taken.


So we need to keep this behavior at least for them.

 We make this safe, by placing modifications to

rank->vcpu within regions protected by the vgic lock.


This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu could be
written/read by any vCPU. So you will never protect rank->vcpu with this lock.
Did I miss anything?


The vgic lock is per-vcpu, but it is always the vgic lock of the old (old
in the sense, before the interrupt is migrated) vcpu that is taken.

On one hand any vcpu can read/write to itargetsr. Those operations are
protected by the rank lock. However vgic_migrate_irq and writes to
rank->vcpu are protected also by the vgic lock of the old vcpu (first
the rank lock is taken, then the vgic lock of the old vcpu).

I don't think this is true. Any vCPU read/write to itargetsr will beprotected by the vgic lock of the vCPU in rank->vcpu[offset].

For inflight IRQ, the migration will happen when the guest has EOIed theIRQ. Until this is happening, the guest may have written multiple timein ITARGETSR.

For instance, let say the guest is requesting to migrate the IRQ fromvCPU A to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr,if the interrupt is inflight vgic_migrate_irq will set a flag. Now theguest could decide to migrate the interrupt to vCPU C before theinterrupt has been EOIed. In this case, vgic lock B will be taken.

So we have a mismatch between the lock taken in gic_update_one_lr andvgic_migrate_irq.

I think the only safe way to fully protect rank->vcpu is taking the ranklock. It will never change and be accessible from anywhere.


gic_update_one_lr doesn't take the rank lock, but it takes the vgic lock
of the old vcpu. We know that it is the old vcpu because it is the one
with the interrupt in question in an LR register. gic_update_one_lr
accesses to rank->vcpu are safe because they are protected by the same
vgic lock.

Does it make sense?

This needs more documentation in the code. Regardless the documentation,I think the code is becoming really complex to understand because withyour changes rank->vcpu is both protectted by the rank and the old vgic lock. And depending of the caller, onlyone of the lock will be taken.

Note that it is not possible to add in __vgic_get_target_vcpu anASSERT(spin_is_locked(&v->arch.vgic.lock)) because v may not be the oldvCPU (see the one I mentioned yesterday).


Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>

---
There are supposed to be 2 Coverity IDs associated with this, but the
system the currently down.


I think the 2 Coverity IDs are 1381855 and 1381853.


Regards,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
  - From: Stefano Stabellini

References:
- [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
  - From: Stefano Stabellini
- Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
  - From: Julien Grall
- Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
  - From: Stefano Stabellini

Prev by Date: Re: [Xen-devel] [PATCH 8/8] libelf: safety: Document safety principles in header file
Next by Date: Re: [Xen-devel] [PATCH 8/8] libelf: safety: Document safety principles in header file
Previous by thread: Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
Next by thread: Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.