[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug



Hi Stefano,

On 20/12/16 19:38, Stefano Stabellini wrote:
On Tue, 20 Dec 2016, Julien Grall wrote:
On 20/12/2016 00:22, Stefano Stabellini wrote:
On Mon, 19 Dec 2016, Julien Grall wrote:
On 19/12/2016 23:30, Stefano Stabellini wrote:
On Mon, 19 Dec 2016, Julien Grall wrote:
2) We run gic_update_one_lr and vgic_store_itargetsr in parallel
safely
and locklessly. There might be a way to do it, but it is not easy
I
haven't found it yet.

Correct me if I am wrong. There are no restriction to write into
IROUTER/ITARGETSR while an IRQ is pending. So the irq_set_affinity
could
be
called once at the beginning of vgic_irq_migrate.

We may receive the interrupt on the wrong physical CPU at the
beginning.
But
it would be the same when writing into IROUTER/ITARGETSR.

This would remove the need to get the rank lock in gic_update_one_lr.

Did I miss anything?

I am not sure if we can do that: the virtual interrupt might not be
EOI'd yet at that time. The guest EOI is used to deactivate the
corresponding physical interrupt. Migrating the physical IRQ at that
time, could have unintended consequences? I am not sure what the spec
says about this, but even if it is correct on paper, I would prefer not
to put it to the test: I bet that not many interrupt controllers have
been heavily tested in this scenario. What do you think?

I don't think this is an issue. An interrupt could have the priority drop
happening on pCPU A and be deactivated on pCPU B if the vCPU A is been
migrated when the interrupt is inflight. So if it is fine here, why would
not
it be when the guest is specifically requesting the routing?

That is true, but this is not exactly the same.

My example was to show you that an IRQ can have its priority dropped in pCPU A
and been deactivated to pCPU B. Another example is when only the IRQ is been
migrated. The spec does not promise you to receive the next interrupt on the
CPU you asked because it may take time to update the GIC state. So the
priority drop and deactivation could be done on separate physical CPU here
too.

This is changing the
physical irq affinity while both physical and virtual irqs are still
active.

Physical IRQ state and virtual IRQ state are completely dissociated in the
GIC. The only interaction possible is the virtual interface to send a
deactivate request to the distributor when the virtual interrupt has been
deactivated and correspond to a hardware interrupt.

As I wrote, usually operating systems only change affinity after
deactivating an irq, so I thought it would be wise in Xen to wait at
least for the EOI.

I looked at the Linux code and did not see a such requirement when setting the
affinity (see irq_set_affinity) of an IRQ.

If we tried to inject the same virtual interrupt on a
different vcpu, while the interrupt is still inflight, we could get in
troubles. But we could avoid that by testing for GIC_IRQ_GUEST_MIGRATING
in vgic_vcpu_inject_irq. Maybe I am worrying about nothing.

The only interrupt that can be routed to a guest in Xen are SPI which are
shared between all CPUs. The active bit is handled by the distributor. It is
not possible to receive the same SPI until it has been deactivated.

True, but keep in mind that we don't get any interruptions when the vcpu
issues an EOI. We lazily clean the data structures the first time we get
back to Xen. So there is a window, where the interrupt has already been
EOI'ed on the first vcpu, but struct pending_irq still shows the
interrupt as inflight and would mess up today's checks in
vgic_vcpu_inject_irq on other vcpus.

I am aware that LRs are cleaned lazily. However, you continue to assume that the GIC will fire the next IRQ on the right pCPU as soon as ITARGETSR and IROUTER will be written.

In case of GICv2, gicv2_irq_set_affinity there is no system barrier (dsb sy). So nothing ensure that the GIC has received the write when the function returns because it takes time to propagate the information.

So no matter the lock you are trying to take. You will have window where the IRQ is received on the wrong physical vCPU.

That's why the field status in pending_irq is accessible with any lock.
gic_update_one_lr is working only on p->status so it is fine here.

The only place where it might be an issue is list_empty(&n->inflight) in vgic_vcpu_inject_irq. But, I think it is perfectly fine because if you the IRQ has not been received on the correct vCPU then, it would be taken care later. For instance when the vCPU has been kicked (see at the end of vgic_vcpu_inject_irq).

Also they wouldn't be protected by
the right vgic lock either. Maybe this is the real reason why I didn't
go for this route originally. Sorry I didn't bring this up earlier, the
irq migration stuff is extremely difficult to get right.

I don't think it is difficult to get the core right. The problem is there are no documentation of this code, so we have to try to figure out why it was written like that before.

All this code was meant to be lockless, thanks to p->status and *_bit helper. The vgic lock taken in vgic_vcpu_inject_irq is protecting inflight_irqs and not the rest.

Regards,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.