Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 08/24] ARM: GICv3: introduce separate pending_irq structs for LPIs

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: André Przywara <andre.przywara@xxxxxxx>

Date: Mon, 16 Jan 2017 09:44:51 +0000

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>

Delivery-date: Mon, 16 Jan 2017 09:48:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 13/01/17 19:37, Stefano Stabellini wrote: > On Thu, 12 Jan 2017, Andre Przywara wrote: Hi Stefano, ... >>>> + list_for_each_entry(lpi_irq, &v->arch.vgic.pending_lpi_list, entry) >>>> + { >>>> + if ( lpi_irq->pirq.irq == lpi ) >>>> + return &lpi_irq->pirq; >>>> + >>>> + if ( lpi_irq->pirq.irq == 0 && !empty ) >>>> + empty = lpi_irq; >>>> + } >>> >>> This is another one of those cases where a list is too slow for the hot >>> path. The idea of allocating pending_irq struct on demand is good, but >>> storing them in a linked list would kill performance. Probably the best >>> thing we could do is an hashtable and we should preallocate the initial >>> array of elements. I don't know what the size of the initial array >>> should be, but we can start around 50, and change it in the future once >>> we do tests with real workloads. Of course the other key parameter is >>> the hash function, not sure which one is the right one, but ideally we >>> would never have to allocate new pending_irq struct for LPIs because the >>> preallocated set would suffice. >> >> As I mentioned in the last post, I expect this number to be really low >> (less than 5). > > Are you able to check this assumption in a real scenario? If not you, > somebody else? > > >> Let's face it: If you have multiple interrupts pending >> for a significant amount of time you won't make any actual progress in >> the guest, because it's busy with handling interrupts. >> So my picture of LPI handling is: >> 1) A device triggers an MSI, so the host receives the LPI. Ideally this >> will be handled by the pCPU where the right VCPU is running atm, so it >> will exit to EL2. Xen will handle the LPI by assigning one struct >> pending_irq to it and will inject it into the guest. >> 2) The VCPU gets to run again and calls the interrupt handler, because >> the (virtual) LPI is pending. >> 3) The (Linux) IRQ handler reads the ICC_IAR register to learn the IRQ >> number, and will get the virtual LPI number. >> => At this point the LPI is done when it comes to the VGIC. The LR state >> will be set to 0 (neither pending or active). This is independent of the >> EOI the handler will execute soon (or later). >> 4) On the next exit the VGIC code will discover that the IRQ is done >> (LR.state == 0) and will discard the struct pending_irq (set the LPI >> number to 0 to make it available to the next LPI). > > I am following > > >> Even if there would be multiple LPIs pending at the same time (because >> the guest had interrupts disabled, for instance), I believe they can be >> all handled without exiting. Upon EOIing (priority-dropping, really) the >> first LPI, the next virtual LPI would fire, calling the interrupt >> handler again, and so no. Unless the kernel decides to do something that >> exits (even accessing the hardware normally wouldn't, I believe), we can >> clear all pending LPIs in one go. >> >> So I have a hard time to imagine how we can really have many LPIs >> pending and thus struct pending_irqs allocated. >> Note that this may differ from SPIs, for instance, because the IRQ life >> cycle is more complex there (extending till the EOI). >> >> Does that make some sense? Or am I missing something here? > > In my tests with much smaller platforms than the ones existing today, I > could easily have 2-3 interrupts pending at the same time without much > load and without any SR-IOV NICs or any other fancy PCIE hardware. The difference to LPIs is that SPIs can be level triggered (eventually requiring a driver to delete the interrupt condition in the device), also require an explicit deactivation to finish off the IRQ state machine. Both these things will lead to an IRQ to stay much longer in the LRs than one would expect for an always edge triggered LPI lacking an active state. Also the timer IRQ is a PPI and thus a frequent visitor in the LRs. > It would be nice to test on Cavium ThunderX for example. Yes, I agree that there is quite some guessing involved, so proving this sounds like a worthwhile task. > It's also easy to switch to rbtrees. On Friday I looked at rbtrees in Xen, which thankfully seem to be the same as in Linux. So I converted the its_devices list over. But in this case here I don't believe that rbtrees are the best data structure, since we frequently need to look up entries, but also need to find new, empty ones (when an LPI has fired). And for allocating new LPIs we don't need a certain slot, just any free would do. We probably want to avoid actually malloc-ing pending_irq structures for that. So a hash table with open addressing sounds like a better fit here: - With a clever hash function (taken for instance Linux' LPI allocation scheme into account) we get very quick lookup times for already assigned LPIs. - Assigning an LPI would use the same hash function, probably finding an unused pending_irq, which we then could easily allocate. I will try to write something along those lines. Cheers, Andre. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.