[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code


  • To: Paul Durrant <paul.durrant@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
  • Date: Sat, 25 Aug 2018 00:38:00 +0100
  • Cc: Eslam Elnikety <elnikety@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Shan Haitao <haitao.shan@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Fri, 24 Aug 2018 23:38:15 +0000
  • Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAG1BMVEUHBwcUFBQpKSlGRkZhYWF9fX2Xl5eysrLMzMxFF+rXAAACaElEQVQ4y21UQXbbIBQE9wJALmAg6ToWON22FrhZthHgbvssUPathC7QWMful2JHSmtWwGg+zPxBCE0DU4QoJQgRgsg4w2gJjBNE8PjFBZgnQMBs+uZ1NQNQjZO3BV4AGDFC0f+l4DBG0VUAM4yv7SO8IgRdHXQ+A78HKL5OAeCfNQV5cHX8DsBUyIJKtYbt98BKaGNCKjfgFVkqYVLbkHKsRsbSCSa0T6npIqLrpRBgQKHUpQmgs9eEKaiUcooE8WWfCGVnBiUcn1uF2XhbfmN9apKnmMP2K4kizKkQWxuaVNOpU2cACIyxO1Po8ETHcXEDMVnozcejkAYA9iaD4pU0ZvNQ8VurNnTuFAYVtuIPUZW25PjDIjQAlGyffIiRQxoWAZBmJ0LTdW2Nyc0iP3DqRhxizvGJkBWZmyFVyZkddWzmBoIBVMpCCJ1CFzl98xav4VJKSSD45KbUT75ixikTphDSRh8+Uz7JLgUTAgAFwzqzjxc/nDY7WUApqY0OMdTwCKZSXplSKkgIRCHElCp8ZnhnKqXuwcNbk1L0VXE+I9alUXoHlLHl3mv7/dWQlJwtjREC7mu9L/U2jQyMUuO2EDS4q9Kl2ddm232bxIE5pjJuVwiljNn/Cfv25/T0cu5cZbwHGVq7h/zp0B4n3S99V/utD+Uo8BiGx9xCsOAV5z7/tjo4Z4z1Lvb90KZ7eFOoOeXOukqF2seo234YYuaQPpRP+cVZU5adT1Edun5Iz3z8fTz3+eSDh0Ip1c7zx1MaijGzTd/3MbRuBHz8cvcVgCMBRpOHvgu59WDhoat+nIZm+LWm9C/aaaGq5DCP9QAAAABJRU5ErkJggg==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, 2018-01-18 at 10:10 -0500, Paul Durrant wrote:
> Lastly the previous code did not properly emulate an EOI if a missed EOI
> was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
> the ISR. The new code instead calls vlapic_EOI_set().

Hm, this *halves* my observed performance running a 32-thread
'diskspd.exe' on a Windows box with attached NVME devices, which makes
me sad.

It's the call to hvm_dpci_msi_eoi() that does it.

Commenting out the call to pt_pirq_iterate() and leaving *just* the
domain-global spinlock bouncing cache lines between all my CPUs, it's
already down to 1.6MIOPS/s from 2.2M on my test box before it does
*anything* at all.

Calling an *inline* version of pt_pirq_iterate so no retpoline for the
indirect calls, and I'm down to 1.1M even when I've nopped out the
whole of the _hvm_dpci_msi_eoi function that it's calling. Put it all
back, and I'm down to about 1.0M. So it's worse than halved.

And what's all this for? The code here is making my eyes bleed but I
believe it's for unmaskable MSIs, and these aren't unmaskable.

Tempted to make it all go away by having a per-domain bitmap of vectors
for which all this work is actually required, and bypassing the whole
bloody lot in hvm_dpci_msi_eoi() if the corresponding in bit that
bitmap isn't set.

The hackish version of that (which seems to work, but would probably
want testing with an actual unmaskable MSI in the system, and I have
absolutely no confidence I understand what's going on here) looks
something like this:

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index bab3aa3..24df008 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -24,6 +24,7 @@
 #include <asm/hvm/irq.h>
 #include <asm/hvm/support.h>
 #include <asm/io_apic.h>
+#include <asm/msi.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -282,6 +283,7 @@ int pt_irq_create_bind(
     struct hvm_pirq_dpci *pirq_dpci;
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
+    irq_desc_t *desc;
 
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
@@ -422,6 +425,13 @@ int pt_irq_create_bind(
 
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
+        BUG_ON(!local_irq_is_enabled());
+        desc = pirq_spin_lock_irq_desc(info, NULL);
+        if ( desc && desc->msi_desc && !msi_maskable_irq(desc->msi_desc) )
+            set_bit(pirq_dpci->gmsi.gvec,
+                    hvm_domain_irq(d)->unmaskable_msi_vecs);
+        spin_unlock_irq(&desc->lock);
+
         spin_unlock(&d->event_lock);
 
         pirq_dpci->gmsi.posted = false;
@@ -869,7 +874,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
 
 void hvm_dpci_msi_eoi(struct domain *d, int vector)
 {
-    if ( !iommu_enabled || !hvm_domain_irq(d)->dpci )
+    if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ||
+         !test_bit(vector, hvm_domain_irq(d)->unmaskable_msi_vecs) )
        return;
 
     spin_lock(&d->event_lock);
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 8a43cb9..d9d4652 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -78,6 +78,7 @@ struct hvm_irq {
     u8 round_robin_prev_vcpu;
 
     struct hvm_irq_dpci *dpci;
+    DECLARE_BITMAP(unmaskable_msi_vecs, 256);
 
     /*
      * Number of wires asserting each GSI.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.