[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IGD pass-through failures since 4.10.



On Mon, Feb 14, 2022 at 09:56:34AM +0100, Jan Beulich wrote:

Good morning, I hope the day is starting well for everyone, Jan thanks
for taking the time to reply.

> On 14.02.2022 07:00, Dr. Greg wrote:

> > It appears to be a problem with mapping interrupts back to dom0 given
> > that we see the following:
> > 
> > Feb 10 08:16:05 hostname kernel: xhci_hcd 0000:00:14.0: xen map irq failed 
> > -19 for 32752 domain
> > 
> > Feb 10 08:16:05 hostname kernel: i915 0000:00:02.0: xen map irq failed -19 
> > for 32752 domain
> > 
> > Feb 10 08:16:12 hostname kernel: xhci_hcd 0000:00:14.0: Error while 
> > assigning device slot ID

> Just on this one aspect: It depends a lot what precisely you've used
> as 4.10 before. Was this the plain 4.10.4 release, or did you track
> the stable branch, accumulating security fixes?

It was based on the Xen GIT tree with a small number of modifications
that had been implemented by Intel to support their IGD
virtualization.

We did not end up using 'IGD virtualization', for a number of
technical reasons, instead we reverted back to using straight device
passthrough with qemu-traditional that we had previously been using.

If it would up being useful, we could come up with a diff between the
stock 4.10.4 tag and the codebase we used.

One of the purposes of the infrastructure upgrade was to try and get
on a completely mainline Xen source tree.

> would suspect device quarantining to get getting in your way. In
> which case it would be relevant to know what exactly "re-attach to
> the Dom0" means in your case.

Re-attach to Dom0 means to unbind the device from the pciback driver
and then bind the device to its original driver.  In the logs noted
above, the xhci_hcd driver to the USB controller and the i915 driver
to the IGD hardware.

It is the same strategy, same script actually, that we have been using
for 8+ years.

In the case of the logs above, the following command sequence is being
executed upon termination of the domain:

# Unbind devices.
echo 0000:00:14.0 >| /sys/bus/pci/drivers/pciback/unbind
echo 0000:00:02.0 >| /sys/bus/pci/drivers/pciback/unbind

# Rebind devices.
echo 0000:00:14.0 >| /sys/bus/pci/drivers/xhci_hcd/bind
echo 0000:00:02.0 >| /sys/bus/pci/drivers/i915/bind

Starting with the stock 4.11.4 release, the Dom0 re-attachment fails
with the 'xen_map_irq' failures being logged.

> Which brings me to this more general remark: What you describe sounds
> like a number of possibly independent problems. I'm afraid it'll be
> difficult for anyone to help without you drilling further down into
> what lower level operations are actually causing trouble. It also feels
> as if things may have ended up working for you on 4.10 just by
> chance.

I think the issue comes down to something that the hypervisor does, on
behalf of the domain doing the passthrough, as part of whatever
qemu-traditional needs to do in order to facilitate the attachment of
the PCI devices to the domain.

Running the detach/re-attach operation works perfectly in absence of
qemu-traditional being started in the domain.  The failure to
re-attach only occurs after qemu-traditional has been run in the
domain.

> I'm sorry that I'm not really of any help here,

Actually your reflections have been helpful.

Perhaps the most important clarification that we could get, for posterity
in this thread, is whether or not IGD pass-through is actually
supported in the mind of the Xen team.

According to the Xen web-site, IGD PCI pass-through is documented as
working with the following combinations:

Xen 4.11.x: QEMU >= 3.1

Xen 4.14.x: QEMU >= 5.2

We are currently having IGD pass-through with qemu-dm (3.1/5.2) fail
completely in those combinations.

Pass through with qemu-traditional works with 4.11.x but the
re-attachment fails.  On 4.14.x, execution of qemu-traditional is
failing secondary to some type of complaint about the inability to
determine the CPU type, which is some other issue that we haven't been
able to run down yet.

Those tests were done with builds from stock tagged releases in the
Xen GIT tree.

So it may be helpful to verify whether or not any of this is expected
to work, and if not, the Xen web-site would seem to need correction.

> Jan

Hopefully the following is helpful, I will be replying to Roger's
comments later.

Have a good day.

Dr. Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686            EMAIL: greg@xxxxxxxxxxxx
------------------------------------------------------------------------------
"If your doing something the same way you have been doing it for ten years,
 the chances are you are doing it wrong."
                                -- Charles Kettering



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.