[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] MSI message data register configuration in Xen guests



On Fri, Jun 29, 2012 at 4:10 AM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Thu, 28 Jun 2012, Deep Debroy wrote:
>> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@xxxxxxxxx> wrote:
>> > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@xxxxxxxx> wrote:
>> >>
>> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@xxxxxxxxx> wrote:
>> >> > Hi, I was playing around with a MSI capable virtual device (so far
>> >> > submitted as patches only) in the upstream qemu tree but having
>> >> > trouble getting it to work on a Xen hvm guest. The device happens to
>> >> > be a QEMU implementation of VMWare's pvscsi controller. The device
>> >> > works fine in a Xen guest when I switch the device's code to force
>> >> > usage of legacy interrupts with upstream QEMU. With MSI based
>> >> > interrupts, the device works fine on a KVM guest but as stated before,
>> >> > not on a Xen guest. After digging a bit, it appears, the reason for
>> >> > the failure in Xen guests is that the MSI data register in the Xen
>> >> > guest ends up with a value of 4300 where the Deliver Mode value of 3
>> >> > happens to be reserved (per spec) and therefore illegal. The
>> >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as
>> >> > illegal (per expectation) causing all commands issued by the guest OS
>> >> > on the device to timeout.
>> >> >
>> >> > Given this above scenario, I was wondering if anyone can shed some
>> >> > light on how to debug this further for Xen. Something I would
>> >> > specifically like to know is where the MSI data register configuration
>> >> > actually happens. Is it done by some code specific to Xen and within
>> >> > the Xen codebase or it all done within QEMU?
>> >> >
>> >>
>> >> This seems like the same issue I ran into, though in my case it is
>> >> with passed through physical devices. See
>> >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and
>> >> the older messages in that thread for more info on what's going on. No
>> >> fix yet but help debugging is very welcome.
>> >
>> > Thanks Rolu for pointing out the other thread - it was very useful.
>> > Some of the symptoms appear to be identical in my case. However, I am
>> > not using a pass-through device. Instead, in my case it's a fully
>> > virtualized device pretty much identical to a raw file backed disk
>> > image where the controller is pvscsi rather than lsi. Therefore I
>> > guess some of the latter discussion in the other thread around
>> > pass-through specific areas of code in qemu are not relevant? Please
>> > correct me if I am wrong. Also note that I am using upstream qemu
>> > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor
>> > xenstore.c exsits (which is where Stefano's suggested change appeared
>> > to be).
>> >
>> > So far, here's what I am observing in the hvm linux guest :
>> >
>> > On the guest side, as discussed in the other thread,
>> > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300
>> > is being by xen_msi_compose_msg that is written in the data register.
>> > On the qemu (upstream) side, when the virtualized controller is trying
>> > to complete a request, it's invoking the following chain of calls ->
>> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi
>> > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi
>> > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the
>> > delivery mode of 0x3.
>> >
>> > Is the above sequence of interactions the expected path for a HVM
>> > guest trying to use a fully virtualized device/controller that uses
>> > MSI in upstream qemu? If so, if a standard linux guest always
>> > populates the value of 0x4300 in the MSI data register through
>> > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in
>> > qemu supposed to work given the delivery type of 0x3 is indeed
>> > reserved and bypass the the vmsi_deliver check?
>> >
>> I wanted to see whether the HVM guest can interact with the MSI
>> virtualized controller properly without any of the Xen-specific code
>> in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code
>> in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled
>> such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen
>> specific routines like xen_hvm_setup_msi_irqs which is where the
>> 0x4300 is getting populated. This seems to work properly. The MSI data
>> register for the controller ends up getting a valid value like 0x4049,
>> vmsi_deliver no longer complains, all MSI notifications are delivered
>> in the expected way to the guest and the raw, file-backed disks
>> attached to the controller showing up in fdisk -l.
>>
>> My conclusion: the linux kernel's xen specific code, specifically
>> routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with
>> fully virtualized qemu devices that use MSI. I will follow-up
>> regarding that on LKML.
>
> Thanks for your analysis of the problem, I think it is correct: Linux PV
> on HVM is trying to setup an event channel delivery for the MSI as it
> always does (therefore choosing 0x3 as delivery mode).
> However emulated devices in QEMU don't support that.
> To be honest emulated devices in QEMU didn't support MSIs at all until
> very recently, so this is why we are seeing this issue only now.
>
> Could you please try this Xen patch and let me know if it makes things
> better?
>

Thanks Stefano. I have tested the below patch with the MSI device and
it's now working (without any additional changes to the linux guest
kernel).

>
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index a90927a..f44f3b9 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, 
> uint32_t data)
>          >> MSI_DATA_TRIGGER_SHIFT;
>      uint8_t vector = data & MSI_DATA_VECTOR_MASK;
>
> +    if ( !vector )
> +    {
> +        int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff);
> +        if ( pirq > 0 )
> +        {
> +            struct pirq *info = pirq_info(d, pirq);
> +
> +            /* if it is the first time, allocate the pirq */
> +            if (info->arch.hvm.emuirq == IRQ_UNBOUND)
> +            {
> +                spin_lock(&d->event_lock);
> +                map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU);
> +                spin_unlock(&d->event_lock);
> +            } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU)
> +            {
> +                printk("%s: pirq %d does not correspond to an emulated 
> MSI\n", __func__, pirq);
> +                return;
> +            }
> +            send_guest_pirq(d, info);
> +            return;
> +        } else {
> +            printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, 
> pirq);
> +        }
> +    }
> +
>      vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
>  }
>
> diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
> index 40e2245..066f64d 100644
> --- a/xen/include/asm-x86/irq.h
> +++ b/xen/include/asm-x86/irq.h
> @@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *);
>  })
>  #define IRQ_UNBOUND -1
>  #define IRQ_PT -2
> +#define IRQ_MSI_EMU -3
>
>  bool_t cpu_has_pending_apic_eoi(void);
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.