[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command



On Fri, 9 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 07/12/16 20:20, Stefano Stabellini wrote:
> > On Tue, 6 Dec 2016, Julien Grall wrote:
> > > On 06/12/2016 22:01, Stefano Stabellini wrote:
> > > > On Tue, 6 Dec 2016, Stefano Stabellini wrote:
> > > > > moving a vCPU with interrupts assigned to it is slower than moving a
> > > > > vCPU without interrupts assigned to it. You could say that the
> > > > > slowness is directly proportional do the number of interrupts assigned
> > > > > to the vCPU.
> > > > 
> > > > To be pedantic, by "assigned" I mean that a physical interrupt is routed
> > > > to a given pCPU and is set to be forwarded to a guest vCPU running on it
> > > > by the _IRQ_GUEST flag. The guest could be dom0. Upon receiving one of
> > > > these physical interrupts, a corresponding virtual interrupt (could be a
> > > > different irq) will be injected into the guest vCPU.
> > > > 
> > > > When the vCPU is migrated to a new pCPU, the physical interrupts that
> > > > are configured to be injected as virtual interrupts into the vCPU, are
> > > > migrated with it. The physical interrupt migration has a cost. However,
> > > > receiving physical interrupts on the wrong pCPU has an higher cost.
> > > 
> > > I don't understand why it is a problem for you to receive the first
> > > interrupt
> > > to the wrong pCPU and moving it if necessary.
> > > 
> > > While this may have an higher cost (I don't believe so) on the first
> > > received
> > > interrupt, migrating thousands of interrupts at the same time is very
> > > expensive and will likely get Xen stuck for a while (think about ITS with
> > > a
> > > single command queue).
> > > 
> > > Furthermore, the current approach will move every single interrupt routed
> > > a
> > > the vCPU, even those disabled. That's pointless and a waste of resource.
> > > You
> > > may argue that we can skip the ones disabled, but in that case what would
> > > be
> > > the benefits to migrate the IRQs while migrate the vCPUs?
> > > 
> > > So I would suggest to spread it over the time. This also means less
> > > headache
> > > for the scheduler developers.
> > 
> > The most important aspect of interrupts handling in Xen is latency,
> > measured as the time between Xen receiving a physical interrupt and the
> > guest receiving it. This latency should be both small and deterministic.
> > 
> > We all agree so far, right?
> > 
> > 
> > The issue with spreading interrupts migrations over time is that it makes
> > interrupt latency less deterministic. It is OK, in the uncommon case of
> > vCPU migration with interrupts, to take a hit for a short time.  This
> > "hit" can be measured. It can be known. If your workload cannot tolerate
> > it, vCPUs can be pinned. It should be a rare event anyway.  On the other
> > hand, by spreading interrupts migrations, we make it harder to predict
> > latency. Aside from determinism, another problem with this approach is
> > that it ensures that every interrupt assigned to a vCPU will first hit
> > the wrong pCPU, then it will be moved.  It guarantees the worst-case
> > scenario for interrupt latency for the vCPU that has been moved. If we
> > migrated all interrupts as soon as possible, we would minimize the
> > amount of interrupts delivered to the wrong pCPU. Most interrupts would
> > be delivered to the new pCPU right away, reducing interrupt latency.
> 
> Migrating all the interrupts can be really expensive because in the current
> state we have to go through every single interrupt and check whether the
> interrupt has been routed to this vCPU. We will also route disabled interrupt.
> And this seems really pointless. This may need some optimization here.

Indeed, that should be fixed.


> With ITS, we may have thousand of interrupts routed to a vCPU. This means that
> for every interrupt we have to issue a command in the host ITS queue. You will
> likely fill up the command queue and add much more latency.
> 
> Even if you consider the vCPU migration to be a rare case. You could still get
> the pCPU stuck for tens of milliseconds, the time to migrate everything. And I
> don't think this is not acceptable.
[...]
> If the number increases, you may end up to have the scheduler to decide to not
> migrate the vCPU because it will be too expensive. But you may have a
> situation where migrating a vCPU with many interrupts is the only possible
> choice and you will slow down the platform.

A vCPU with thousand of interrupts routed to it, is the case where I
would push back to the scheduler. It should know that moving the vcpu
would be very costly.

Regardless, we need to figure out a way to move the interrupts without
"blocking" the platform for long. In practice, we might find a
threshold: a number of active interrupts above which we cannot move them
all at once anymore. Something like: we move the first 500 active
interrupts immediately, we delay the rest. We can find this threshold
only with practical measurements.


> Anyway, I would like to see measurement in both situation before deciding when
> LPIs will be migrated.

Yes, let's be scientific about this.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.