[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2


On 29/11/2018 07:40, Andrii Anisov wrote:


Again, I sent this cover letter only to myself. So, here it is, hope it does 
not break the thread. Sorry for the mess.

It is seems to be part of the thread. Thank you!

From: Andrii Anisov <andrii.anisov@xxxxxxxxx>
Sent: Wednesday, November 28, 2018 11:31 PM
Cc: Andrii Anisov
Subject: [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2
From: Andrii Anisov <andrii_anisov@xxxxxxxx>

This patch series is an attempt to reduce IRQ latency with the
old GIC implementation (gic-vgic). These patches originally based
on XEN 4.10 release. The motivation was to improve benchmark
results of a system given to a customer for evaluation.

I will repeat what I said on patch #15 and expanding it a bit. This series
is touching the old vGIC we plan to decommissioned soon. Amongst may issues
with the old vGIC the majors one are the locking is fragile
and edge/level interrupts are not handled correctly.

Furthermore, a lot of those patches seem to borrow the ideas from the new vGIC.
So I think it would be more beneficial to focus on the new vGIC that will be
the only vGIC supported soon.

This patch series is tailored for GICv2 on RCAR H3. > Several
of the most controversial patches (i.e. LRs shadowing) were
not shared to the customer, and here are for comments and discussion.

We need benchmark and detailed explanation to understand where the latency
comes from and how every patches reduce the latency. Ideally I would like to
see the performance improvement done patch by patch so we can decide
which one are worth it.

I hope several patches from here could be upstreamed. Some as is,
others with modifications.

There are several simple ideas behind these changes:
     - reduce an excessive code (condition checks)

This makes sense.

     - drop an excessive peripheral register accesses

This needs discussion. I will have a look at the patches.

     - if not reduce, then move an excessive code out of spinlocks
     - if not drop, then move an excessive register
       accesses out of spinlocks
In general spinlocks should not be contended otherwise you have other
issues. Moving code of spinlock is only worth it if that code may
return in most of the cases.

With porting existing patches, I've got another idea that accessing
PER_CPU variables like `current` and `lr_mask` is not really cheap.
How come? The resulting code for lr_mask is pretty trivial.

C function:

uint64_t foo(void)
    return this_cpu(lr_mask);

Assembly code:

44:   90000000        adrp    x0, 0 <maintenance_interrupt>
48:   91000000        add     x0, x0, #0x0
4c:   d53cd041        mrs     x1, tpidr_el2
50:   f8616800        ldr     x0, [x0, x1]
54:   d65f03c0        ret

So this is one memory load. Leaving aside nested virt, TPDIR_EL2
should be fast to access.

So it should produce faster code keeping `lr_mask` solely in
`struct vcpu` and pass `current` as a function parameter instead
of calculation it each time in called functions.

So you are going to save 3 instructions... But that's just a drop in the bucket compare to the rest of code. Actually, I would be surprised if you actually notice it on your benchmark.

Overall, I think that there are other places to look at where
the benefits will be much bigger. We can discuss about smaller one if the performance are still low.


Andrii Anisov (15):
   hack: drop GIC v3 support
   vgic: move pause_flags check out of vgic spinlock
   vgic: move irq_to_pending out of lock
   gic-vgic: Drop an excessive clear_lrs
   gic: drop interrupts enabling on interrupts processing
   gic-vgic:vgic: do not keep disabled IRQs in any of queues
   gic: separate ppi processing
   gic-vgic:vgic: avoid excessive conversions
   gic:vgic:gic-vgic: introduce non-atomic bitops
   irq: skip action avalability check for guest's IRQ
   gic-v2: Write HCR only on change
   gic-vgic: skip irqs locking
   hack: arm/domain: simplify context restore from idle vcpu
   hack: move gicv2 LRs reads and writes out of spinlocks
   gic: vgic: align frequently accessed data by cache line size

Julien Grall (1):
   xen/arm: Re-enable interrupt later in the trap path

  xen/arch/arm/arm64/entry.S           | 11 ++---
  xen/arch/arm/configs/arm64_defconfig |  1 +
  xen/arch/arm/domain.c                | 25 ++++++----
  xen/arch/arm/gic-v2.c                | 63 ++++++++++++++++--------
  xen/arch/arm/gic-v3-lpi.c            |  2 +
  xen/arch/arm/gic-v3.c                |  2 +
  xen/arch/arm/gic-vgic.c              | 84 +++++++++++++++++++-------------
  xen/arch/arm/gic.c                   | 32 +++++++++++--
  xen/arch/arm/irq.c                   | 46 +++++++++++++++---
  xen/arch/arm/traps.c                 |  6 +++
  xen/arch/arm/vgic-v3-its.c           |  3 +-
  xen/arch/arm/vgic.c                  | 93 ++++++++++++++++++++++++++++++------
  xen/arch/arm/vgic/vgic.c             |  2 +
  xen/include/asm-arm/config.h         |  2 +-
  xen/include/asm-arm/gic.h            | 10 ++--
  xen/include/asm-arm/irq.h            |  3 ++
  xen/include/asm-arm/vgic.h           | 24 ++++++----
  xen/include/xen/sched.h              |  1 +
  18 files changed, 299 insertions(+), 111 deletions(-)

Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.