[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen pci-passthrough problem with pci-detach and pci-assignable-remove



Friday, January 10, 2014, 5:12:48 PM, you wrote:

> On Fri, Jan 10, 2014 at 04:57:29PM +0100, Sander Eikelenboom wrote:
>> 
>> Friday, January 10, 2014, 4:12:18 PM, you wrote:
>> 
>> > On Fri, Jan 10, 2014 at 03:51:57PM +0100, Sander Eikelenboom wrote:
>> >> Hi Konrad,
>> >> 
>> >> Normally i'm never reattaching pci devices to dom0, but at the moment i 
>> >> have some use for it.
>> >> 
>> >> But it seems pci-detach isn't completely detaching the device from the 
>> >> guest.
>> >> 
>> >> - Say i have a guest (HVM) with domid=2 and a pci device passedthrough 
>> >> with bdf 00:19.0, the device is hidden on boot with 
>> >> xen-pciback.hide=(00:19.0) in grub.
>> >> 
>> >> - Now i do a "xl pci-assignable-list"
>> >>   This returns nothing, which is correct since all hidden devices have 
>> >> already been assigned to guests.
>> >> 
>> >> - Then i do "xl -v pci-detach 2 00:19.0"
>> >>   Which also returns nothing ...
>> >> 
>> >> - Now i do a "xl pci-assignable-list" again ..
>> >>   This returns:
>> >>   "0000:00:19.0"
>> >>   So the pci-detach does seem to have done *something* :-)
>> 
>> > Or it thinks it has :-)
>> 
>> Well it has .. but probably not enough ;-)
>> 
>> >> 
>> >> - But when now trying to remove the device from pciback to dom0 with "xl 
>> >> pci-assignable-remove 00:19.0" it gives an error
>> >>   and later it give some stacktraces ..
>> >> 
>> >>   xen_pciback: ****** removing device 0000:00:19.0 while still in-use! 
>> >> ******
>> >>   xen_pciback: ****** driver domain may still access this device's i/o 
>> >> resources!
>> >>   xen_pciback: ****** shutdown driver domain before binding device
>> >>   xen_pciback: ****** to other drivers of domains
>> 
>> > What about /var/log/xen/qemu-dm* and the 'lspci' in the guest? Is the PCI 
>> > device
>> > removed from there?
>> 
>> Oeh i should have thought of that ...
>> 
>> in the guest i get a "e1000e 0000:00:06.0 removed PHC" and it's gone from 
>> lspci ..
>> in /var/log/xen/qemu-dm* .. i get nothing .. but i was using qemu-xen .. 
>> which is totally non verbose ..
>> 
>> So let's try with qemu-xen-traditional .. which i also forgot to test ...
>> 
>> Which gives exact the same error / warning as above, but it has some output 
>> in  /var/log/xen/qemu-dm*:
>> 
>> pt_msgctrl_reg_write: setup msi for dev 30
>> pt_msi_setup: pt_msi_setup requested pirq = 54
>> pt_msi_setup: msi mapped with pirq 36
>> pt_msi_update: Update msi with pirq 36 gvec 0 gflags 3036
>> pt_msgctrl_reg_write: setup msi for dev 28
>> pt_msi_setup: pt_msi_setup requested pirq = 53
>> pt_msi_setup: msi mapped with pirq 35
>> pt_msi_update: Update msi with pirq 35 gvec 0 gflags 3035
>> pt_msi_update: Update msi with pirq 36 gvec 0 gflags 3034
>> dm-command: hot remove pass-through pci dev
>> generate a sci for PHP.
>> deassert due to disable GPE bit.
>> ACPI:debug: write addr=0xb044, val=0x30.
>> ACPI:debug: write addr=0xb045, val=0x3.
>> ACPI:debug: write addr=0xb044, val=0x30.
>> ACPI:debug: write addr=0xb045, val=0x88.
>> ACPI PCI hotplug: write devfn=0x30.
>> pci_intx: intx=1
>> pci_intx: intx=1
>> pt_msi_disable: Unbind msi with pirq 36, gvec 0
>> pt_msi_disable: Unmap msi with pirq 36

> Good, so the device is safely removed from the guest.
> QEMU acted on 'libxl' command to remove it.

>> 
>> 
>> 
>> Also worth mentioninng is that the console on which the "xl 
>> pci-assignable-remove 00:19.0" command is given, keeps hanging and 
>> eventually the hungtask stacktrace will appear.
>> 
>> >> 
>> >> 
>> >> When i shut the guest down instead of using pci-detach, the "xl 
>> >> pci-assignable-remove" works fine and i can rebind the device to it's 
>> >> driver in dom0.
>> >> 
>> >> So am i misreading the wiki .. and is it not possible to detach a device 
>> >> from a running domain or ... ?
>> >> 
>> >> Oh yes running xen-unstable and a 3.13-rc7 kernel
>> 
>> > Do you see the same issue with 'xend'?
>> 
>> Erhmmm haven't used that for what seems to be ages .. :-)

> Heh.
>> 
>> Hmm i also forgot the hungtask stacktrace i get sometime after the "xl 
>> pci-assignable-remove 00:19.0" ...


> Wow. You just walked in a pile of bugs didn't you? And on Friday
> nonethless.

As usual ;-)

>> 
>> It seems to be the pci_reset_function ...
>> 
>> [   52.099144] xen_bridge: port 4(vif2.0-emu) entered forwarding state
>> [   55.683141] xen_bridge: port 1(vif1.0) entered forwarding state
>> [   59.861385] xen-blkback:ring-ref 8, event-channel 22, protocol 1 
>> (x86_64-abi) persistent grants
>> [   66.043965] xen_bridge: port 3(vif2.0) entered forwarding state
>> [   66.044549] xen_bridge: port 3(vif2.0) entered forwarding state
>> [   81.091149] xen_bridge: port 3(vif2.0) entered forwarding state
>> [  227.441191] xen_pciback: ****** removing device 0000:00:19.0 while still 
>> in-use! ******
>> [  227.443482] xen_pciback: ****** driver domain may still access this 
>> device's i/o resources!
>> [  227.445811] xen_pciback: ****** shutdown driver domain before binding 
>> device
>> [  227.447811] xen_pciback: ****** to other drivers or domains
>> [  368.859343] INFO: task xl:3675 blocked for more than 120 seconds.
>> [  368.860447]       Not tainted 3.13.0-rc7-20140110-creabox-nuc+ #1
>> [  368.860990] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [  368.861682] xl              D ffff88003fd93f00     0  3675   3489 
>> 0x00000000
>> [  368.862319]  ffff880038c0e880 0000000000000282 0000000000000000 
>> ffff880038fd03d0
>> [  368.863035]  0000000000013f00 0000000000013f00 ffff880038c0e880 
>> ffff880036abffd8
>> [  368.863802]  ffffffff81087ac6 ffff88003a0f00f8 ffff88003a0f00fc 
>> ffff880038c0e880
>> [  368.864514] Call Trace:
>> [  368.864744]  [<ffffffff81087ac6>] ? mutex_spin_on_owner+0x38/0x45
>> [  368.865273]  [<ffffffff818e5e22>] ? schedule_preempt_disabled+0x6/0x9
>> [  368.865851]  [<ffffffff818e7034>] ? __mutex_lock_slowpath+0x159/0x1b5
>> [  368.866409]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
>> [  368.866892]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e
>> [  368.867430]  [<ffffffff818e7dc1>] ? _raw_spin_lock_irqsave+0x14/0x36
>> [  368.867996]  [<ffffffff818e7238>] ? down_write+0x9/0x26
>> [  368.868467]  [<ffffffff813f1863>] ? pcistub_put_pci_dev+0x7b/0xe0
>> [  368.868991]  [<ffffffff813f14a7>] ? pcistub_remove+0xd0/0x127
>> [  368.869506]  [<ffffffff8135b5b8>] ? pci_device_remove+0x38/0x83
>> [  368.870017]  [<ffffffff814cb37f>] ? __device_release_driver+0x82/0xdb
>> [  368.870593]  [<ffffffff814cb602>] ? device_release_driver+0x1a/0x25
>> [  368.871152]  [<ffffffff814ca993>] ? unbind_store+0x59/0x89
>> [  368.871659]  [<ffffffff81178aa0>] ? sysfs_write_file+0x13f/0x18f
>> [  368.872173]  [<ffffffff81122aa6>] ? vfs_write+0x95/0xfb
>> [  368.872641]  [<ffffffff81122d8a>] ? SyS_write+0x51/0x85
>> [  368.873087]  [<ffffffff818ed179>] ? system_call_fastpath+0x16/0x1b
>> [  488.871331] INFO: task xl:3675 blocked for more than 120 seconds.
>> [  488.913929]       Not tainted 3.13.0-rc7-20140110-creabox-nuc+ #1
>> [  488.937031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [  488.960945] xl              D ffff88003fd93f00     0  3675   3489 
>> 0x00000004
>> [  488.986090]  ffff880038c0e880 0000000000000282 0000000000000000 
>> ffff880038fd03d0
>> [  489.010383]  0000000000013f00 0000000000013f00 ffff880038c0e880 
>> ffff880036abffd8
>> [  489.034456]  ffffffff81087ac6 ffff88003a0f00f8 ffff88003a0f00fc 
>> ffff880038c0e880
>> [  489.058621] Call Trace:
>> [  489.082358]  [<ffffffff81087ac6>] ? mutex_spin_on_owner+0x38/0x45
>> [  489.106272]  [<ffffffff818e5e22>] ? schedule_preempt_disabled+0x6/0x9
>> [  489.130158]  [<ffffffff818e7034>] ? __mutex_lock_slowpath+0x159/0x1b5
>> [  489.154147]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
>> [  489.177890]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e

> Yeah, that bug my RFC patchset (the one that does the slot/bus reset) should 
> also fix.
> I totally forgot about it !

Got a link to that patchset ?
I at least could give it a spin .. you never know when fortune is on your side 
:-)

> I hope.

>> [  489.200927]  [<ffffffff818e7dc1>] ? _raw_spin_lock_irqsave+0x14/0x36
>> [  489.224076]  [<ffffffff818e7238>] ? down_write+0x9/0x26
>> [  489.246898]  [<ffffffff813f1863>] ? pcistub_put_pci_dev+0x7b/0xe0
>> [  489.270086]  [<ffffffff813f14a7>] ? pcistub_remove+0xd0/0x127
>> [  489.293053]  [<ffffffff8135b5b8>] ? pci_device_remove+0x38/0x83
>> [  489.316068]  [<ffffffff814cb37f>] ? __device_release_driver+0x82/0xdb
>> [  489.338896]  [<ffffffff814cb602>] ? device_release_driver+0x1a/0x25
>> [  489.362459]  [<ffffffff814ca993>] ? unbind_store+0x59/0x89
>> [  489.385396]  [<ffffffff81178aa0>] ? sysfs_write_file+0x13f/0x18f
>> [  489.408605]  [<ffffffff81122aa6>] ? vfs_write+0x95/0xfb
>> [  489.431407]  [<ffffffff81122d8a>] ? SyS_write+0x51/0x85
>> [  489.454251]  [<ffffffff818ed179>] ? system_call_fastpath+0x16/0x1b
>> 
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> 
>> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.