 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Issue with MSI in a HVM domU with several passed through PCI devices
 On Wed, Jun 27, 2012 at 3:36 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Tue, 26 Jun 2012, Rolu wrote: >> On Tue, Jun 26, 2012 at 2:59 PM, Stefano Stabellini >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> > On Tue, 26 Jun 2012, Rolu wrote: >> >> On Mon, Jun 25, 2012 at 1:38 PM, Stefano Stabellini >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> > On Mon, 25 Jun 2012, Jan Beulich wrote: >> >> >> >>> On 24.06.12 at 04:21, Rolu <rolu@xxxxxxxx> wrote: >> >> >> > On Wed, Jun 20, 2012 at 6:03 PM, Jan Beulich <JBeulich@xxxxxxxx> >> >> >> > wrote: >> >> >> >> At the same time, adding logging to the guest kernel would >> >> >> >> be nice, to see what value it actually writes (in a current >> >> >> >> kernel this would be in __write_msi_msg()). >> >> >> >> >> >> >> > >> >> >> > Turns out that msg->data here is also 0x4300, so it seems the guest >> >> >> > kernel is producing these values. I caused it to make a stack trace >> >> >> > and this pointed back to xen_hvm_setup_msi_irqs. This function uses >> >> >> > the macro XEN_PIRQ_MSI_DATA, which evaluates to 0x4300. It checks the >> >> >> > current data field and if it isn't equal to the macro it uses >> >> >> > xen_msi_compose_msg to make a new message, but that function just >> >> >> > sets >> >> >> > the data field of the message to XEN_PIRQ_MSI_DATA - so, 0x4300. This >> >> >> > then gets passed to __write_msi_msg and that's that. There are no >> >> >> > other writes through __write_msi_msg (except for the same thing for >> >> >> > other devices). >> >> >> > >> >> >> > The macro XEN_PIRQ_MSI_DATA contains a part (3 << 8) which ends up >> >> >> > decoded as the delivery mode, so it seems the kernel is intentionally >> >> >> > setting it to 3. >> >> >> >> >> >> So that can never have worked properly afaict. Stefano, the >> >> >> code as it is currently - using literal (3 << 8) - is clearly bogus. >> >> >> Your original commit at least had a comment saying that the >> >> >> reserved delivery mode encoding is intentional here, but that >> >> >> comment got lost with the later introduction of XEN_PIRQ_MSI_DATA. >> >> >> In any case - the cooperation with qemu apparently doesn't >> >> >> work, as the reserved encoding should never make it through >> >> >> to the hypervisor. Could you explain what the intention here >> >> >> was? >> >> >> >> >> >> And regardless of anything, can the literal numbers please be >> >> >> replaced by proper manifest constants - the "8" here already >> >> >> has MSI_DATA_DELIVERY_MODE_SHIFT, and giving the 3 a >> >> >> proper symbolic would permit locating where this is being (or >> >> >> really, as it doesn't appear to work supposed to be) consumed >> >> >> in qemu, provided it uses the same definition (i.e. that one >> >> >> should go into one of the public headers). >> >> > >> >> > The (3 << 8) is unimportant. The delivery mode chosen is "reserved" >> >> > because notifications are not supposed to be delivered as MSI anymore. >> >> > >> >> > This is what should happen: >> >> > >> >> > 1) Linux configures the device with a 0 vector number and the pirq >> >> > number >> >> > in the address field; >> >> > >> >> > 2) QEMU notices a vector number of 0 and reads the pirq number from the >> >> > address field, passing it to xc_domain_update_msi_irq; >> >> > >> >> > 3) Xen assignes the given pirq to the physical MSI; >> >> > >> >> > 4) The guest issues a EVTCHNOP_bind_pirq hypercall; >> >> > >> >> > 5) Xen sets the pirq as "IRQ_PT"; >> >> > >> >> > 6) When Xen tries to inject the MSI into the guest, hvm_domain_use_pirq >> >> > returns true so Xen calls send_guest_pirq instead. >> >> > >> >> > >> >> > Obviously 6) is not happening. hvm_domain_use_pirq is: >> >> > >> >> > is_hvm_domain(d) && pirq && pirq->arch.hvm.emuirq != IRQ_UNBOUND >> >> > >> >> > My guess is that emuirq is IRQ_UNBOUND when it should be IRQ_PT (see >> >> > above). >> >> >> >> This appears to be true. I added logging to hvm_pci_msi_assert in >> >> xen/drivers/passthrough/io.c and it indicates that >> >> pirq->arch.hvm.emuirq is -1 (while IRQ_PT is -2) every time right >> >> before an unsupported delivery mode message. >> >> >> >> I also log pirq->pirq but I found that most of the time I can't find >> >> this value anywhere else (I'm not sure how to interpret the value, >> >> though). For example, in my last try: >> >> >> >> * I get an unsupported delivery mode error for pirq->pirq 55, 54 and >> >> 53. The vast majority are for 54. >> >> * I have logging in map_domain_emuirq_pirq in xen/arch/x86/irq.c. It >> >> gets called with pirq 19, 20, 21, 22, 23, 52, 51, 50, 16, 17, 55. >> >> Never for 54 or 53. It also gets called with pirq=49,emuirq=23 once >> >> but complains it's already mapped. >> >> * I have logging in evtchn_bind_pirq in xen/common/event_channel.c. It >> >> gets called with bind->pirq 16, 17, 51, 55, 49, 29 (twice), 21, 19, >> >> 22, 52, 48, 47. Also never 54 or 53. >> >> * map_domain_emuirq_pirq is called from evtchn_bind_pirq for pirq 16, 17, >> >> 55. >> >> * The qemu log mentions pirq 35, 36 and 37 >> >> >> >> It seems pirq values don't always mean the same? Is it a coincidence >> >> that 55 occurs almost everywhere, or is something going wrong with the >> >> other two values (53 and 54 versus 16 and 17)? >> >> >> >> I have three MSI capable devices passed through to the domU, and I do >> >> see groups of three distinct pirqs in the data above - just not the >> >> same ones in every place I look. >> >> >> >> > So maybe the guest is not issuing a EVTCHNOP_bind_pirq hypercall >> >> > (__startup_pirq doesn't get called), or Xen is erroring out in >> >> > map_domain_emuirq_pirq. >> >> >> >> evtchn_bind_pirq gets called, though I'm not sure if it is with the right >> >> data. >> >> >> >> map_domain_emuirq_pirq always gets past the checks in the top half >> >> (i.e. up to the line /* do not store emuirq mappings for pt devices >> >> */), except for one time with pirq=49,emuirq=23 where it finds they >> >> are already mapped. >> >> It is called three times with an emuirq of -2, for pirq 16, 17 and 55. >> >> This implies their info->arch.hvm.emuirq is also set to -2 (haven't >> >> directly logged that but it's the only assignment there). >> >> >> >> Interestingly, I get an unsupported delivery mode error for pirq 55 >> >> where my logging says pirq->arch.hvm.emuirq is -1, *after* >> >> map_domain_emuirq_pirq was called for pirq 55 and emuirq -2. >> > >> > Looking back at your QEMU logs, it seems that pt_msi_setup is not >> > called (or it is not called at the right time), otherwise you should >> > get: >> > >> > pt_msi_setup requested pirq = %d >> > >> > in your logs. >> > Could you try disabling msitranslate? You can do that adding >> > >> > pci_msitranslate=0 >> > >> > to your VM config file. >> >> I tried that, but it didn't work. >> >> > If that works, probably this (untested) QEMU patch could fix your problem: >> > >> >> I appreciate the help. >> >> I applied the patch anyway just to see what would happen (had to edit >> a few dev versus d variable names) but it didn't help. It also breaks >> pt_msi_update, as I get in the qemu log: >> >> pt_msi_update: Update msi with pirq 2f gvec 0 gflags 302f >> pt_msi_update: Error: Binding of MSI failed. >> pt_msi_update: Error: Unmapping of MSI failed. >> pt_msgctrl_reg_write: Warning: Can not bind MSI for dev 80 >> >> I added some logging to pt_msi_setup (without the patch). It does get >> called, and it does so rather early in the boot process, each time >> right before lines as these: >> >> pci_intx: intx=1 >> register_real_device: Real physical device 00:1b.0 registered successfuly! >> IRQ type = MSI-INTx >> >> At this point dev->msi->data, addr_hi and addr_lo are all 0, which >> doesn't seem right. Is it being called prematurely? > > That's because msitranslate is still enabled somehow, that is a > toolstack bug. > While we fix that bug, could you try this QEMU patch to forcefully disable > msitranslate? > This worked! The "unsupported delivery mode" message is gone. Sound works, although there is still occasionally a very short stutter, but I expect that's a different issue. I've been testing with a KDE desktop with 3D effects (cube, expo, that sort of stuff) and performance there has gone up noticeably, from around 30-40 fps in most cases to near 60. > diff --git a/xenstore.c b/xenstore.c > index ac90366..8af280e 100644 > --- a/xenstore.c > +++ b/xenstore.c > @@ -427,7 +427,7 @@ uint32_t xenstore_read_target(void) > return target_domid; > } > > -#define PT_PCI_MSITRANSLATE_DEFAULT 1 > +#define PT_PCI_MSITRANSLATE_DEFAULT 0 > #define PT_PCI_POWER_MANAGEMENT_DEFAULT 0 > int direct_pci_msitranslate; > int direct_pci_power_mgmt; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel 
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |