[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: introduce platform_need_explicit_eoi



On Sat, 21 Jun 2014, Julien Grall wrote:
> On 21/06/14 16:26, Stefano Stabellini wrote:
> > On Sat, 21 Jun 2014, Julien Grall wrote:
> > > On 21/06/14 15:43, Stefano Stabellini wrote:
> > > > On Fri, 20 Jun 2014, Julien Grall wrote:
> > > > > On 20 Jun 2014 18:41, "Stefano Stabellini"
> > > > > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > > > > > 
> > > > > > On Fri, 20 Jun 2014, Julien Grall wrote:
> > > > > > > On 20/06/14 17:48, Stefano Stabellini wrote:
> > > > > > > > 
> > > > > > > > > Your patch "physical irq follow virtual irq" won't even solve
> > > > > > > > > this
> > > > > > > > > problem
> > > > > > > > > if
> > > > > > > > > the maintenance interrupt is request to EOI the IRQ.
> > > > > > > > 
> > > > > > > > I don't understand this statement.
> > > > > > > > The maintenance interrupt is requested to make sure that the
> > > > > > > > vcpu is
> > > > > > > > interrupted as soon as it EOIs the virtual irq, so that
> > > > > > > > gic_update_one_lr can run.
> > > > > > > 
> > > > > > > I agree with that but ... the following step can happen
> > > > > > > 
> > > > > > > 1) Xen receives the interrupt
> > > > > > > 2) Xen writes EOIR
> > > > > > > 3) Xen inject the IRQ to the guest
> > > > > > > 4) The guest will receive the IRQ (i.e read IAR)
> > > > > > > 5) Xen migrates the VCPU to another physical CPU
> > > > > > > 6) The guest writes into GIC_DIR and a maintenance IRQ is fired.
> > > > > > > 
> > > > > > > In a such case, the IRQ will be EOIed to another physical CPU. If
> > > > > > > we
> > > > > > > assume
> > > > > > > that we should always write to GICC_DIR of the physical CPU that
> > > > > > > receive the
> > > > > > > interrupt, then even your patch "physcial irq follow virtual irq"
> > > > > > > won't solve
> > > > > > > the problem.
> > > > > > 
> > > > > > That's true.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > > > The VGIC lock is per-vcpu.
> > > > > > > > > 
> > > > > > > > > If this is happening while we migrate, nothing protect p->lr
> > > > > > > > > and
> > > > > > > > > the
> > > > > > > > > different
> > > > > > > > > bit anymore.
> > > > > > > > 
> > > > > > > > The patch series
> > > > > > > > alpine.DEB.2.02.1406111607450.13771@xxxxxxxxxxxxxxxxxxxxxxx
> > > > > > > > takes
> > > > > > > > care of vcpu migration and locking.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > Indeed, the vgic_inject_irq will use the new VCPU. This can
> > > > > > > > > happen
> > > > > > > > > as soon
> > > > > > > > > as
> > > > > > > > > we EOI the IRQ.
> > > > > > > > 
> > > > > > > > Yes, but at that point we have also already migrated the
> > > > > > > > corresponding
> > > > > > > > physical irq to the new pcpu.
> > > > > > > 
> > > > > > > Even tho we the migration has been done, the 2 clear bit are not
> > > > > > > atomic. Let's
> > > > > > > imagine that the IRQ has fired between the two clear bit. In this
> > > > > > > case
> > > > > > > we may
> > > > > > > clear the ACTIVE bit by mistake.
> > > > > > > 
> > > > > > > I don't see anything in this code which prevents a such
> > > > > > > configuration.
> > > > > > 
> > > > > > Which 2 clear bit?
> > > > > 
> > > > > GUEST_VISIBLE and GUEST_ACTIVE see the code after eoi_irq.
> > > > 
> > > > The code is protected by the vgic lock.
> > > 
> > > If the IRQ has been migrating while it's injected, the maintenance
> > > interrupt
> > > will use the previous VCPU and vgic_inject_irq will use the new one. 2
> > > different lock protecting the same thing we but not mutually exclusive.
> > > 
> > > As soon as we write in GICC_DIR, this case can happen.
> > 
> > I don't think this case can happen: the maintenance interrupt is
> > synchronous, so it can happen either before the migration, or after the
> > vcpu has been fully migrated and its execution has been resumed.
> > GICC_DIR is written in response to a maintenance interrupt being fired,
> > so GICC_DIR will be written either before the vcpu has been migrated or
> > after.
> 
> I agree that the maintenance interrupt synchronous... but for this part I was
> talking talking about VIRQ migration (i.e the guest is writing in ITARGETSR).
> 
> 1) The guest is receiving the IRQ
> 2) The guest is writing in ITARGETSR to move the IRQ to another VCPU
> 3) The guest is EOIing the IRQ and Xen receive a maintenance interrupt.
> 
> You set the affinity of the physical IRQ directly in the VGIC ITARGETSR
> emulation. So as soon as we write in GICC_DIR, this IRQ may fired again on
> another physical CPU (assuming the new VCPU is currently running). In this
> case, vgic_inject_irq will use the new VGIC lock and nothing will prevent
> concurrent access to clear/set the bit GUEST_VISIBLE and GUEST_ACTIVE.

This is a valid concern.
I thought about this concurrency problem and I think I managed to
solve it correctly. The solution I used was introducing a new flag
called "GIC_IRQ_GUEST_MIGRATING". See:

1402504032-13267-4-git-send-email-stefano.stabellini@xxxxxxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.