[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/pt: skip setup of posted format IRTE when gvec is 0



On Mon, May 06, 2019 at 12:44:41PM +0800, Chao Gao wrote:
> On Thu, May 02, 2019 at 10:20:09AM +0200, Roger Pau Monné wrote:
> >On Wed, May 01, 2019 at 12:41:13AM +0800, Chao Gao wrote:
> >> On Tue, Apr 30, 2019 at 11:30:33AM +0200, Roger Pau Monné wrote:
> >> >On Tue, Apr 30, 2019 at 05:01:21PM +0800, Chao Gao wrote:
> >> >> On Tue, Apr 30, 2019 at 01:56:31AM -0600, Jan Beulich wrote:
> >> >> >>>> On 30.04.19 at 07:19, <chao.gao@xxxxxxxxx> wrote:
> >> >> >> When testing with an UP guest with a pass-thru device with vt-d pi
> >> >> >> enabled in host, we observed that guest couldn't receive interrupts
> >> >> >> from that pass-thru device. Dumping IRTE, we found the corresponding
> >> >> >> IRTE is set to posted format with "vector" field as 0.
> >> >> >> 
> >> >> >> We would fall into this issue when guest used the pirq format of MSI
> >> >> >> (see the comment xen_msi_compose_msg() in linux kernel). As 'dest_id'
> >> >> >> is repurposed, skip migration which is based on 'dest_id'.
> >> >> >
> >> >> >I've gone through all uses of gvec, and I couldn't find any existing
> >> >> >special casing of it being zero. I assume this is actually 
> >> >> >communication
> >> >> >between the kernel and qemu,
> >> >> 
> >> >> Yes. 
> >> >> 
> >> >> >in which case I'd like to see an
> >> >> >explanation of why the issue needs to be addressed in Xen rather
> >> >> >than qemu.
> >> >> 
> >> >> To call pirq_guest_bind() to configure irq_desc properly.
> >> >> Especially, we append a pointer of struct domain to 'action->guest' in
> >> >> pirq_guest_bind(). Then __do_IRQ_guest() knows domains that are 
> >> >> interested
> >> >> in this interrupt and injects an interrupt to those domains.
> >> >> 
> >> >> >Otherwise, if I've overlooked something, would you
> >> >> >mind pointing out where such special casing lives in Xen?
> >> >> >
> >> >> >In any event it doesn't look correct to skip migration altogether in
> >> >> >that case. I'd rather expect it to require getting done differently.
> >> >> >After all there still is a (CPU, vector) tuple associated with that
> >> >> >{,p}IRQ if it's not posted, and hvm_migrate_pirq() is a no-op if it is
> >> >> >posted.
> >> >> 
> >> >> Here, we try to set irq's target cpu to the cpu which the vmsi's target 
> >> >> vcpu
> >> >> is running on to reduce IPI. But the 'dest_id' field which used to
> >> >> indicate the vmsi's target vcpu is missing, we don't know which cpu we 
> >> >> should
> >> >> migrate the irq to. One possible choice is the 'chn->notify_vcpu_id'
> >> >> used in send_guest_pirq(). Do you think this choice is fine?
> >> >
> >> >I think that by the time the device model calls into pirq_guest_bind
> >> >the PIRQ won't be bound to any event channel, so pirq->evtchn would be
> >> >0.
> >> 
> >> Then skip pirq migration is the only choice here? And we can migrate
> >> pirq when it is bound with an event channel.
> >> 
> >> >
> >> >Note that the binding of the PIRQ with the event channel is done
> >> >afterwards in xen_hvm_setup_msi_irqs by the Linux kernel.
> >> >
> >> >It seems like the device model should be using a different set of
> >> >hypercalls to setup a PIRQ that is routed over an event channel, ie:
> >> >PHYSDEVOP_map_pirq and friends.
> >> 
> >> Now qemu is using PHYSDEVOP_map_pirq. Right?
> >
> >Oh yes, QEMU already uses PHYSDEVOP_map_pirq to setup the interrupt.
> >Then I'm not sure I see why QEMU calls XEN_DOMCTL_bind_pt_irq for
> >interrupts that are routed over event channels. That hypercall is used
> 
> As I said above, it is to call pirq_guest_bind() to hook up to irq handler.
> 
> XEN_DOMCTL_bind_pt_pirq does two things:
> #1. bind pirq with a guest interrupt
> #2. register (domain,pirq) to the interrupt handler
> 
> currently, for pirq routed to evtchn, #1 is done by another hypercall,
> evtchn_bind_pirq. and #2 is done in XEN_DOMCTL_bind_pt_irq.

So XEN_DOMCTL_bind_pt_irq basically does the pirq_guest_bind in this
case, and that's why the call to pirq_guest_bind is avoided in
evtchn_bind_pirq for HVM guests.

> 
> >to bind a pirq to a native guest interrupt injection mechanism, which
> >shouldn't be used if the interrupt is going to be delivered over an
> >event channel.
> >
> >Can you see about avoiding the XEN_DOMCTL_bind_pt_irq call in QEMU if
> >the interrupt is going to be routed over an event channel?
> 
> Yes. It is doable. But it needs changes in both qemu and Xen and some tricks
> to be compatible with old qemu.

OK, leaving the XEN_DOMCTL_bind_pt_irq call in QEMU and making it a
no-op in this case seems like a good compromise solution IMO.

It might be helpful to add a comment to the QEMU code noting that the
XEN_DOMCTL_bind_pt_irq hypercall is not needed when routing pirqs over
event channels for HVM guests with Xen >= 4.13.

> I prefer not to touch qemu and keep qemu unware of MSI's "routing over 
> evtchn",
> like the patch below:
> 
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index e86e2bf..0bcddb9 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -504,10 +504,7 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
>      if ( !info )
>          ERROR_EXIT(-ENOMEM);
>      info->evtchn = port;
> -    rc = (!is_hvm_domain(d)
> -          ? pirq_guest_bind(v, info,
> -                            !!(bind->flags & BIND_PIRQ__WILL_SHARE))
> -          : 0);
> +    rc = pirq_guest_bind(v, info, !!(bind->flags & BIND_PIRQ__WILL_SHARE));
>      if ( rc != 0 )
>      {
>          info->evtchn = 0;
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 4290c7c..5a0b83e 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -346,6 +346,12 @@ int pt_irq_create_bind(
>          uint32_t gflags = pt_irq_bind->u.msi.gflags &
>                            ~XEN_DOMCTL_VMSI_X86_UNMASKED;
>  
> +        if ( !pt_irq_bind->u.msi.gvec )
> +        {
> +            spin_unlock(&d->event_lock);
> +            return 0;
> +        }
> +

Can you see about short-circuiting pt_irq_create_bind much earlier?
Likely at the start of the function, AFAICT pirqs routed over event
channels don't need hvm_irq_dpci, so I think you can avoid allocating
it if all pirqs from a domain are routed over event channels.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.