[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()



On Thu, 22 Nov 2018 18:51:13 +0200
Andrii Anisov <andrii.anisov@xxxxxxxxx> wrote:

Hi,

> On 20.11.18 20:47, Julien Grall wrote:
> > 
> > 
> > On 20/11/2018 18:10, Andrii Anisov wrote:  
> >> Hello Julien,
> >>
> >>
> >> On 19.11.18 18:42, Julien Grall wrote:  
> >>> There are no issue about processing IRQs before the syncs. It is
> >>> the same as if an IRQ was raised from ila different pCPUs.
> >>> So why do you need that?  
> >>
> >>  From my understanding of gic-vgic code (old vgic), for the IRQs 
> >> targeting the `current` vcpu, it leads to a faster processing
> >> under interrupts storm conditions. If it was all LRs set on
> >> previous switch to a guest, a the IRQ will have a chance to go
> >> directly to LR instead of setting on lr_pending queue. Also
> >> inflight_irqs queue have a chance to be shorter to insert.  
> > 
> > Do you have actual numbers?  
> Unfortunately, my numbers are pretty indirect. I'm referring glmark2 
> benchmark results. With this and the rest of my changes (not yet 
> published), I can cut out another percent or two of performance drop
> due to XEN existence in the system. BTW, that's why I recently asked
> Stefano about his approach of interrupt latency measurement.
> 
> On my board that benchmark processing causes at least 4 different HW 
> interrupts issuing with different frequency.

Is that benchmark chosen to put some interrupt load on the system? Or
is that what the customer actually uses and she is really suffering
from Xen's interrupt handling and emulation?
If you chose this benchmark because it tends to trigger a lot of
interrupts and you hope to estimate some other interrupt property with
this (for instance interrupt latency or typical LR utilisation), then
you might be disappointed. This seems to go into the direction of an
interrupt storm, where we really don't care about performance, but just
want to  make sure we keep running and ideally don't penalise other
guests.

> Adding the reschedule
> IRQ makes the system tend to not fit all IRQs into 4 LRs available in
> my GIC. Moreover, the benchmark does not emit a network traffic or
> disk usage during the run. So real life cases will add more
> concurrent IRQs.

That's rather odd. Are you sure you actually have all LRs used? What is
your guest system? Linux? Can you check whether you use EOI mode 1 in
the guest ("GIC: Using split EOI/Deactivate mode" in dmesg, right after
"NR_IRQS: xx, nr_irqs: xx, preallocated irqs: 0")?

Typically Linux EOIs an IRQ very quickly, and with EOI mode 0 you just
have *one* active IRQ. So the other interrupts would be pending, which
means your guest is busy with handling interrupts. Which points to
other problems and might not be a valid benchmark.
Also: what is the exact problem with exceeding the number of hardware
LRs? If you have a high interrupt load (more than three or four
interrupts pending at the same time), chances are you exit anyway quite
often, in which case the LRs get updated. I don't see the huge
advantage of being able to present 8 pending IRQs to the guest.

> > Also to be on the same page, what is your 
> > definition of interrupts storm?  
> I mean the system takes different interrupts (more IRQ sources than
> LRs available) with a relatively high rate. Let's say more than 7000 
> interrupts per second. It's not very big number, but close to what I
> see on my desk.

The frequency of the interrupt (n per second) should be unrelated to
the number of IRQs being presented simultaneously to the guest.
Typically I would assume you have zero interrupts normally, because
your guest is doing actual work (running the OS or userland program).
Then you handle the occasional interrupt (1 LR in active state), and
the timer IRQ fires during this. This lets the guest exit, and the
second LR gets used with the injected pending timer IRQ. Now every now
and then an IPI might also be injected from another VCPU at the same
time, which brings the count up to 3. But all of the time the guest
still handles this first interrupt. And chances are the interrupt
handler sooner or later triggers an (MMIO) exit, at which case the
number of LR becomes irrelevant. A high number of LRs would only be
interesting if you are able to process all those interrupts without a
single exit, typically this is only true for the virtual arch timer IRQ.

Also keep in mind that some instructions here and there in the
interrupt handling path in Xen might not be relevant if you exit the
guest frequently (due to interrupts, for instance). The cost of an exit
will probably dwarf the cost of adding a struct pending_irq to a linked
list or so.

Cheers,
Andre.

> > Bear in mind that the old vGIC will be phased out soon.  
> As I remember a new vgic experimental yet. Do not support GIC-v3 yet.
> 
> > If you are 
> > worried about performance, then I would recommend to try the new
> > vGIC and see whether it improves.  
> You know, we are based on XEN 4.10. Initially, when a customer said 
> about their dissatisfaction about performance drop in benchmark due
> to XEN existence, I tried 4.12-unstable, both an old and a new VGIC.
> So performance with 4.12-unstable with the old VGIC was worse than
> 4.10, and the new VGIC made things even much worse. I can't remember
> the exact numbers or proportions, but that was the reason we do not
> offer upgrading XEN yet.
> 
> > Well, if you re-enable the interrupts you give a chance for higher 
> > priority interrupts to come up. This will not happen if you have 
> > interrupts disabled.  
> I understand the theory, but can not match it with the current XEN
> code. Guest interrupts prioritization within do_IRQ is pretty
> meaningless. They will go through the same path. And an effect would
> not be seen before exiting to a guest.
> The PPI interrupts are reflected into the processing of soft IRQs or 
> injecting an IRQ into queues. So it does not matter much when exactly
> we do read the IRQ from IAR in a gic_interrupt loop. I suppose it
> should be faster to loop through gic_interrupt at once, collecting
> all interrupts, without going through exception path, then switch to
> soft IRQs processing in leave_hypervisor_tail.
> The only thing which might get a noticeable effect here is serving 
> GIC_SGI_CALL_FUNCTION, which is executed right away from
> `gic_interrupt`.
> 
> > But you seem to base your assumption on interrupts storm (yet to be 
> > defined). If you have an interrupt storm, then you are already
> > doomed as your guest/Xen will not have time to do any other work.  
> 
> > 
> > In any case, you need to provide number to support your
> > optimization.I'm moving all my patches to current staging and would
> > send them as RFC   
> with a description of why is it done and how I measured results.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.