[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] Passed through PCI devices lost after Windows HVM DomU reboot


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Paul Leiber <paul@xxxxxxxxxxxxxxxx>
  • Date: Mon, 7 Jun 2021 23:44:03 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=onlineschubla.de; dmarc=pass action=none header.from=onlineschubla.de; dkim=pass header.d=onlineschubla.de; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hiBQufO8MgvLLqJIRMD1d8pqD6CWk9lLT4OPKl+Ag30=; b=DnnqKbYJzxUUT3hBFITyC/FVnHosjkCR7T71C7GrNJSXhlyA1M7Y1c4MozZ1l7oMIU23CkWSzQro65gh6ESGqQzleQY15gKMyBHjngg6EmI87bwH41LYW8sTH4wfKhav8Y4kvOAOSURWtTRAK8ilFgPYJh2MBuae5tSpmweu4HQ+QKDiTU/7Xn3y9THTMEcCUJwH5KWZGtS9D/BAthEEg+Isl9LKbFcjuLvIsSnbEoUYvP78riuIfEyex3O54RcJrPqSnp7UqSObO3VjDmx4Mh+NnsZZgTA3zmMhT2KhMiJvFjFs8QmX28DXeo5aiEGGVi4G84ItJCAb3YX3offg+Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Zj3cZjaBXl++cXCSPl3InRJFB6+0miw/b5MWGI5xHIhseaeWc9222zXhfC7vpRYpzjha835yJtWfJTlxS+i2B1xDoiLBhYSyAw0k214m2guryylq9nvTExBRSIlF/acOybcSD2ScKDJTQmad9n+BEbIVqa9+7g34MKQJqMohYdnTlYyiX558GQ1GMjbWz7czn8nQh5TVUtP/VBuCju83D6xVXfpcnjpMkyUaZVhfsf+Vp57IgcgOIsVtg09Yv3XU4OpxnTTk+aOJN9ETMpsQWJ4bCSBhodf6zlx1WylzyggrFkDyTPo7aQooXcr/bNwh0UBngf5Y2ck8vKt94dZHFw==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=onlineschubla.de;
  • Delivery-date: Mon, 07 Jun 2021 23:44:21 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: Addb9FwKHMmb5HghTwunCUNmuZyBkw==
  • Thread-topic: [BUG] Passed through PCI devices lost after Windows HVM DomU reboot

Dear developers,

I  am a mostly very happy Xen beginner. My Debian PV DomUs work like a charm 
out of the box. The only remaining Windows instance is a MediaPortal TV Server 
backend on Windows Server 2012 HVM DomU. But I have problems with reliably 
passing through PCIe cards to this Windows HVM DomU. Further testing has lead 
me to the suspicion that there might be a bug where PCI passthrough does not 
work after a Windows DomU reboot.

Please be patient with me if am not reporting this bug as is custom, this is my 
first official bug report ever. (If it is indeed a bug.)

Background: I am running a standard apt-get Xen installation based on Debian 
Buster.  My hardware is a Fujitsu D3417-B1 with an Intel Xeon CPU E3-1235L v5, 
32 GB ECC RAM, and a Hauppauge HVR-2205 TV tuner card. For getting PCI 
passthrough to work, I needed to set "permissive=1" and limit the Dom0 memory 
size. I then could pass through the PCIe TV tuner card without any problem to 
my Windows Server 2012 DomU. It got detected and worked very well in the 
Windows DomU. However, sometimes the card somehow got "lost" in the DomU, i. e. 
it disappeared from device manager and wasn't functional anymore. I then could 
reattach it to the DomU with "xl pci-attach". My TV software (MediaPortal) then 
seemed to recognize a new PCIe card instance (e. g. an internal id number of 
the tuner card was incremented). I then needed to reapply some settings. Other 
than that, the card was fully functional.

After more testing, I have come to the following conclusion: It seems that 
every time I do a _reboot_ from within a Windows DomU, the PCI device does not 
get attached to the DomU. After DomU reboot, it is immediately available for 
attachment in the Dom0 when I check for it with "xl pci-assignable-list", and I 
can reattach it to the DomU with "xl pci-attach" without any major problems 
beside some annoying side effects (e. g. need to reapply settings). If I _shut 
down_ the DomU from within the DomU (with Windows shutdown mechanism) or the 
Dom0 (with "xl shutdown) and restart the DomU with "xl create", the PCIe device 
gets attached automatically at DomU boot and unwanted side effects do not occur.

What I would expect is that the passed through PCIe device is available in my 
Windows DomU after each reboot (e. g. after Windows Update automatically 
installs patches and reboots).

Steps which I can take to provoke the unwanted behavior:
1. Install Xen on Debian Buster following mostly 
https://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide
2. Set up PCI passthrough following mostly 
https://wiki.xenproject.org/wiki/Xen_PCI_Passthrough (see additional details 
below)
3. Set up a Windows Server 2012 HVM (cfg below)
4. Start Windows Server 2012 HVM with "xl create /etc/xen/matrix.cfg", connect 
with Windows HVM via VNC for installation and initial settings, then via 
RemoteDesktop
5. Check for PCIe device in Windows Device Manager: it is available
6. Initiate reboot in Windows (Go to Server Manager -> local server -> reboot)
7. Connect with rebooted Windows via RemoteDesktop
8. Check for PCIe device in Windows Device Manager, it is not available
9. Check for PCIe device in Dom0 with " xl pci-assignable-list", it is 
available for passthrough
10. Attach the PCIe device to the Windows DomU, e.g. via "xl pci-attach 9 
01:00.0"
11. Check for PCIe device in Windows Device Manager, it is available again
12. Repetition is possible by skipping to step 6

The xl log for a normal cold start (PCIe device attached normally) looks like 
this:
Waiting for domain matrix (domid 10) to die [pid 3910]

The log after a reboot (PCIe device not attached automatically) looks like this:
Waiting for domain matrix (domid 8) to die [pid 3113]
Domain 8 has shut down, reason code 1 0x1
Action for shutdown reason code 1 is restart
libxl: warning: libxl_domain.c:1739:libxl_retrieve_domain_configuration: Domain 
8:Device present in JSON but not in xenstore, ignored
Domain 8 needs to be cleaned up: destroying the domain
Done. Rebooting now

Searching for this exact error message ("Device present in JSON but not in 
xenstore, ignored"), I found the following quite old bug report which sounds 
suspiciously similar to my experience, only for PV DomUs:
https://bugzilla.redhat.com/show_bug.cgi?id=233801

Additional information which might be helpful:
- I could reproduce this behavior with two different TV tuner cards from 
different manufacturers (Hauppauge HVR-2205 or Digital Devices Max M4) and a 
network card (Intel 82574L)
- I tested the behavior with a fresh install of Windows 10, with the same 
results.
- I used the Hauppauge PCIe card in a linux PV DomU (with VDR software) where 
the card was attached very reliably - as far as I can remember, there was only 
one occurrence of a not working TV card, but I can't remember the details (i. 
e. if there was a preceding reboot).
- The unwanted behavior did not occur with the bare metal system before I 
switched to Xen, i. e. Windows Server 2012 running directly on the hardware and 
the Hauppauge PCIe card.

A description of my problem (which was a little bit less detailed) on the Xen 
Users mailing list did not get a reply, therefore I am turning to the developer 
mailing list. Could anybody on this list please give me advice on what I can do 
solve this issue? Any more information you need to help me or any more testing 
I could do?

Thanks in advance,

Paul



Additional information:


While trying to fix this, I changed kernel boot parameters. I figured out that 
giving kernel boot option " xen-pciback.hide" is not necessary as the driver is 
not built into the kernel, therefore I changed the parameters from
        dom0_mem=1024M,max:1024M xen-pciback.hide=(01:00.0)
to the currently used parameters:
        dom0_mem=1024M,max:1024M


The Digital Devised PCIe device is assigned to xen-pciback via 
/etc/modprobe.d/xen-pciback.conf. There is no  driver on the Dom0 for the tuner 
card, therefore no precautions for not loading other drivers are necessary:
        options xen-pciback hide=(0000:01:00.0)


The Hauppauge card needs an additional line for preventing loading the driver 
in Dom0:
        install saa7164 /sbin/modprobe xen-pciback ; /sbin/modprobe 
--first-time --ignore-i$
        options xen-pciback hide=(0000:01:00.0)


While doing trial and error, I changed the pci line in the Xen config file, but 
adding " power_mgmt=1" and "seize=1" didn't change the behavior:
        pci=['01:00.0,permissive=1,power_mgmt=1,seize=1']


Xen config file for the Windows domU (besides the above mentioned changes in 
the line pci=[...], there were some probably minor changes between first 
installation and the current status, e. g. I started with VNC and later 
switched to SPICE):

# kernel = "/usr/lib/xen-4.0/boot/hvmloader"
type='hvm'
memory = 4096
vcpus=2
name = "matrix"
vif = ['bridge=xenbr0,mac=00:16:3E:54:A8:2B']
disk = ['phy:/dev/vg0/matrix,hda,w','phy:/dev/vg0/compudms-data,hdb,w']
device_model_version = 'qemu-xen'
boot="c"
hdtype = 'ahci'
acpi = 1
apic = 1
xen_platform_pci = 1
vendor_device = 'xenserver'
#  PCI Passthrough
pci=['01:00.0,permissive=1,power_mgmt=1']
viridian = 1
stdvga = 1
sdl = 0
serial = 'pty'
usb = 1
usbdevice = 'tablet'
keymap = 'de'
# SPICE
spice=1
spicehost='0.0.0.0'
spiceport=6000
# spicedisable_ticketing enabled is for no spice password, instead use 
spicepasswd
spicedisable_ticketing=1
#spicepasswd="test"
spicevdagent=1
spice_clipboard_sharing=1
# this will automatically redirect up to 4 usb devices from spice client to 
domUs
#spiceusbredirection=4
# This adds intel hd audio emulated card used for spice audio
soundhw="hda"


xl info:

host                   : xxx
release                : 4.19.0-14-amd64
version                : #1 SMP Debian 4.19.171-2 (2021-01-30)
machine                : x86_64
nr_cpus                : 4
max_cpu_id             : 3
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 1992.100
hw_caps                : 
bfebfbff:77faf3ff:2c100800:00000121:0000000f:009c6fbf:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 32542
free_memory            : 20836
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 11
xen_extra              : .4
xen_version            : 4.11.4
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=1024M,max:1024M
cc_compiler            : gcc (Debian 8.3.0-6) 8.3.0
cc_compile_by          : pkg-xen-devel
cc_compile_domain      : lists.alioth.debian.org
cc_compile_date        : Fri Dec 11 21:33:51 UTC 2020
build_id               : 6d8e0fa3ddb825695eb6c6832631b4fa2331fe41
xend_config_format     : 4


lspci -vvv (excerpt)

01:00.0 Multimedia controller: Digital Devices GmbH Device 000a
        Subsystem: Digital Devices GmbH Device 0050
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at f7200000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, 
L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- 
SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency 
L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range A, TimeoutDis+, LTR-, OBFF 
Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, 
LinkEqualizationRequest-
        Capabilities: [100 v1] Vendor Specific Information: ID=0000 Rev=0 
Len=00c <?>
        Kernel driver in use: pciback


Xenstore-ls -fp (excerpt)

/libxl/10 = ""   (n0)
/libxl/10/device = ""   (n0)
/libxl/10/device/vbd = ""   (n0)
/libxl/10/device/vbd/768 = ""   (n0)
/libxl/10/device/vbd/768/frontend = "/local/domain/10/device/vbd/768"   (n0)
/libxl/10/device/vbd/768/backend = "/local/domain/0/backend/vbd/10/768"   (n0)
/libxl/10/device/vbd/768/params = "/dev/vg0/matrix"   (n0)
/libxl/10/device/vbd/768/script = "/etc/xen/scripts/block"   (n0)
/libxl/10/device/vbd/768/frontend-id = "10"   (n0)
/libxl/10/device/vbd/768/online = "1"   (n0)
/libxl/10/device/vbd/768/removable = "0"   (n0)
/libxl/10/device/vbd/768/bootable = "1"   (n0)
/libxl/10/device/vbd/768/state = "1"   (n0)
/libxl/10/device/vbd/768/dev = "hda"   (n0)
/libxl/10/device/vbd/768/type = "phy"   (n0)
/libxl/10/device/vbd/768/mode = "w"   (n0)
/libxl/10/device/vbd/768/device-type = "disk"   (n0)
/libxl/10/device/vbd/768/discard-enable = "1"   (n0)
/libxl/10/device/vbd/832 = ""   (n0)
/libxl/10/device/vbd/832/frontend = "/local/domain/10/device/vbd/832"   (n0)
/libxl/10/device/vbd/832/backend = "/local/domain/0/backend/vbd/10/832"   (n0)
/libxl/10/device/vbd/832/params = "/dev/vg0/compudms-data"   (n0)
/libxl/10/device/vbd/832/script = "/etc/xen/scripts/block"   (n0)
/libxl/10/device/vbd/832/frontend-id = "10"   (n0)
/libxl/10/device/vbd/832/online = "1"   (n0)
/libxl/10/device/vbd/832/removable = "0"   (n0)
/libxl/10/device/vbd/832/bootable = "1"   (n0)
/libxl/10/device/vbd/832/state = "1"   (n0)
/libxl/10/device/vbd/832/dev = "hdb"   (n0)
/libxl/10/device/vbd/832/type = "phy"   (n0)
/libxl/10/device/vbd/832/mode = "w"   (n0)
/libxl/10/device/vbd/832/device-type = "disk"   (n0)
/libxl/10/device/vbd/832/discard-enable = "1"   (n0)
/libxl/10/device/console = ""   (n0)
/libxl/10/device/console/0 = ""   (n0)
/libxl/10/device/console/0/frontend = "/local/domain/10/console"   (n0)
/libxl/10/device/console/0/backend = "/local/domain/0/backend/console/10/0"   
(n0)
/libxl/10/device/console/0/frontend-id = "10"   (n0)
/libxl/10/device/console/0/online = "1"   (n0)
/libxl/10/device/console/0/state = "1"   (n0)
/libxl/10/device/console/0/protocol = "vt100"   (n0)
/libxl/10/device/vkbd = ""   (n0)
/libxl/10/device/vkbd/0 = ""   (n0)
/libxl/10/device/vkbd/0/frontend = "/local/domain/10/device/vkbd/0"   (n0)
/libxl/10/device/vkbd/0/backend = "/local/domain/0/backend/vkbd/10/0"   (n0)
/libxl/10/device/vkbd/0/frontend-id = "10"   (n0)
/libxl/10/device/vkbd/0/online = "1"   (n0)
/libxl/10/device/vkbd/0/state = "1"   (n0)
/libxl/10/device/vif = ""   (n0)
/libxl/10/device/vif/0 = ""   (n0)
/libxl/10/device/vif/0/frontend = "/local/domain/10/device/vif/0"   (n0)
/libxl/10/device/vif/0/backend = "/local/domain/0/backend/vif/10/0"   (n0)
/libxl/10/device/vif/0/frontend-id = "10"   (n0)
/libxl/10/device/vif/0/online = "1"   (n0)
/libxl/10/device/vif/0/state = "1"   (n0)
/libxl/10/device/vif/0/script = "/etc/xen/scripts/vif-bridge"   (n0)
/libxl/10/device/vif/0/mac = "00:16:3e:54:a8:2b"   (n0)
/libxl/10/device/vif/0/bridge = "xenbr0"   (n0)
/libxl/10/device/vif/0/handle = "0"   (n0)
/libxl/10/device/vif/0/type = "vif_ioemu"   (n0)
/libxl/10/device/pci = ""   (n0)
/libxl/10/device/pci/0 = ""   (n0)
/libxl/10/device/pci/0/frontend = "/local/domain/10/device/pci/0"   (n0)
/libxl/10/device/pci/0/backend = "/local/domain/0/backend/pci/10/0"   (n0)
/libxl/10/device/pci/0/frontend-id = "10"   (n0)
/libxl/10/device/pci/0/online = "1"   (n0)
/libxl/10/device/pci/0/state = "1"   (n0)
/libxl/10/device/pci/0/domain = "matrix"   (n0)
/libxl/10/device/pci/0/key-0 = "0000:01:00.0"   (n0)
/libxl/10/device/pci/0/dev-0 = "0000:01:00.0"   (n0)
/libxl/10/device/pci/0/vdevfn-0 = "48"   (n0)
/libxl/10/device/pci/0/opts-0 = "msitranslate=0,power_mgmt=1,permissive=1"   
(n0)
/libxl/10/device/pci/0/state-0 = "1"   (n0)
/libxl/10/device/pci/0/num_devs = "1"   (n0)
/libxl/10/type = "hvm"   (n0)
/libxl/10/dm-version = "qemu_xen"   (n0)




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.