Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 08/24] ARM: GICv3: introduce separate pending_irq structs for LPIs

To: André Przywara <andre.przywara@xxxxxxx>

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Mon, 16 Jan 2017 11:16:30 -0800 (PST)

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

Delivery-date: Mon, 16 Jan 2017 19:16:48 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 16 Jan 2017, André Przywara wrote: > On 13/01/17 19:37, Stefano Stabellini wrote: > > On Thu, 12 Jan 2017, Andre Przywara wrote: > > Hi Stefano, > > ... > > >>>> + list_for_each_entry(lpi_irq, &v->arch.vgic.pending_lpi_list, entry) > >>>> + { > >>>> + if ( lpi_irq->pirq.irq == lpi ) > >>>> + return &lpi_irq->pirq; > >>>> + > >>>> + if ( lpi_irq->pirq.irq == 0 && !empty ) > >>>> + empty = lpi_irq; > >>>> + } > >>> > >>> This is another one of those cases where a list is too slow for the hot > >>> path. The idea of allocating pending_irq struct on demand is good, but > >>> storing them in a linked list would kill performance. Probably the best > >>> thing we could do is an hashtable and we should preallocate the initial > >>> array of elements. I don't know what the size of the initial array > >>> should be, but we can start around 50, and change it in the future once > >>> we do tests with real workloads. Of course the other key parameter is > >>> the hash function, not sure which one is the right one, but ideally we > >>> would never have to allocate new pending_irq struct for LPIs because the > >>> preallocated set would suffice. > >> > >> As I mentioned in the last post, I expect this number to be really low > >> (less than 5). > > > > Are you able to check this assumption in a real scenario? If not you, > > somebody else? > > > > > >> Let's face it: If you have multiple interrupts pending > >> for a significant amount of time you won't make any actual progress in > >> the guest, because it's busy with handling interrupts. > >> So my picture of LPI handling is: > >> 1) A device triggers an MSI, so the host receives the LPI. Ideally this > >> will be handled by the pCPU where the right VCPU is running atm, so it > >> will exit to EL2. Xen will handle the LPI by assigning one struct > >> pending_irq to it and will inject it into the guest. > >> 2) The VCPU gets to run again and calls the interrupt handler, because > >> the (virtual) LPI is pending. > >> 3) The (Linux) IRQ handler reads the ICC_IAR register to learn the IRQ > >> number, and will get the virtual LPI number. > >> => At this point the LPI is done when it comes to the VGIC. The LR state > >> will be set to 0 (neither pending or active). This is independent of the > >> EOI the handler will execute soon (or later). > >> 4) On the next exit the VGIC code will discover that the IRQ is done > >> (LR.state == 0) and will discard the struct pending_irq (set the LPI > >> number to 0 to make it available to the next LPI). > > > > I am following > > > > > >> Even if there would be multiple LPIs pending at the same time (because > >> the guest had interrupts disabled, for instance), I believe they can be > >> all handled without exiting. Upon EOIing (priority-dropping, really) the > >> first LPI, the next virtual LPI would fire, calling the interrupt > >> handler again, and so no. Unless the kernel decides to do something that > >> exits (even accessing the hardware normally wouldn't, I believe), we can > >> clear all pending LPIs in one go. > >> > >> So I have a hard time to imagine how we can really have many LPIs > >> pending and thus struct pending_irqs allocated. > >> Note that this may differ from SPIs, for instance, because the IRQ life > >> cycle is more complex there (extending till the EOI). > >> > >> Does that make some sense? Or am I missing something here? > > > > In my tests with much smaller platforms than the ones existing today, I > > could easily have 2-3 interrupts pending at the same time without much > > load and without any SR-IOV NICs or any other fancy PCIE hardware. > > The difference to LPIs is that SPIs can be level triggered (eventually > requiring a driver to delete the interrupt condition in the device), > also require an explicit deactivation to finish off the IRQ state machine. > Both these things will lead to an IRQ to stay much longer in the LRs > than one would expect for an always edge triggered LPI lacking an active > state. > Also the timer IRQ is a PPI and thus a frequent visitor in the LRs. > > > It would be nice to test on Cavium ThunderX for example. > > Yes, I agree that there is quite some guessing involved, so proving this > sounds like a worthwhile task. > > > It's also easy to switch to rbtrees. > > On Friday I looked at rbtrees in Xen, which thankfully seem to be the > same as in Linux. So I converted the its_devices list over. > > But in this case here I don't believe that rbtrees are the best data > structure, since we frequently need to look up entries, but also need to > find new, empty ones (when an LPI has fired). > And for allocating new LPIs we don't need a certain slot, just any free > would do. We probably want to avoid actually malloc-ing pending_irq > structures for that. > > So a hash table with open addressing sounds like a better fit here: > - With a clever hash function (taken for instance Linux' LPI allocation > scheme into account) we get very quick lookup times for already assigned > LPIs. > - Assigning an LPI would use the same hash function, probably finding an > unused pending_irq, which we then could easily allocate. > > I will try to write something along those lines. Thanks Andre, indeed it sounds like a better option.

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.