[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen pci-passthrough problem with pci-detach and pci-assignable-remove



Thursday, February 20, 2014, 9:53:59 AM, you wrote:


> Friday, January 24, 2014, 6:48:06 PM, you wrote:

>> On Fri, Jan 24, 2014 at 02:36:02PM +0100, Sander Eikelenboom wrote:
>>> 
>>> Friday, January 10, 2014, 6:38:10 PM, you wrote:
>>> 
>>> >> > Wow. You just walked in a pile of bugs didn't you? And on Friday
>>> >> > nonethless.
>>> >> 
>>> >> As usual ;-)
>>> 
>>> > Ha!
>>> > ..snip..
>>> >> >> [  489.082358]  [<ffffffff81087ac6>] ? mutex_spin_on_owner+0x38/0x45
>>> >> >> [  489.106272]  [<ffffffff818e5e22>] ? 
>>> >> >> schedule_preempt_disabled+0x6/0x9
>>> >> >> [  489.130158]  [<ffffffff818e7034>] ? 
>>> >> >> __mutex_lock_slowpath+0x159/0x1b5
>>> >> >> [  489.154147]  [<ffffffff818e70a6>] ? mutex_lock+0x16/0x25
>>> >> >> [  489.177890]  [<ffffffff8135972d>] ? pci_reset_function+0x26/0x4e
>>> >> 
>>> >> > Yeah, that bug my RFC patchset (the one that does the slot/bus reset) 
>>> >> > should also fix.
>>> >> > I totally forgot about it !
>>> >> 
>>> >> Got a link to that patchset ?
>>> 
>>> > https://lkml.org/lkml/2013/12/13/315
>>> 
>>> >> I at least could give it a spin .. you never know when fortune is on 
>>> >> your side :-)
>>> 
>>> > It is also at this git tree:
>>> 
>>> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git and the
>>> > branch name is "devel/xen-pciback.slot_and_bus.v0". You will likely
>>> > want to merge it in your current Linus tree.
>>> 
>>> > Thank you!
>>> 
>>> 
>>> Hi Konrad,
>>> 
>>> Just got time to test this some more, when merging this branch *except* the 
>>> last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd)
>>> seems to help with my problem,i'm no capable of using:
>>> - xl pci-detach
>>> - xl pci-assignable-remove
>>> - echo "BDF" > /sys/bus/pci/drivers/<devicename>/bind
>>> 
>>> to remove a pci device from a running HVM guest and rebinding it to a 
>>> driver in dom0 without those nasty stacktraces :-)
>>> So the first 4 seem to be an improvement.
>>> 
>>> That last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd) seems to give 
>>> troubles of it's own.

>> Could you email me your lspci output and also which devices you move/switch 
>> etc?

> Hi Konrad,

> At the moment i found some time to figure out what goes wrong with the xl 
> pci-detach and xl pci-assignable-remove, i have been
> able to narrow it down a bit:

> The problem only occurs when you:
> - passthrough 2 (or more?) pci devices assigned to a guest ..
> - and only remove 1 of those devices with "xl pci-detach" followed by a "xl 
> pci-assignable-remove"
> - when you first detach both devices with "xl pci-detach" before doing the 
> "xl pci-assignable-remove" it works ok.

> In my case i'm passingthrough 2 devices (02:00.0 and 00:19.0)

> I added some printk's and what i found out is that:
> - after doing the pci-detach of 02:00.0, it doesn't call pcistub_put_pci_dev 
> for that device ...
> - but when i subsequently pci-detach the second (and last) device 00:19.0 .. 
> it does call it for both 02:00.0 and 00:19.0 ...
> - so somehow that call for the first detached device gets deferred .. but 
> since it are different devices and not functions of the same device i don't
>   see any reason for it to wait until all other devices would have been 
> detached ...


> I tried to capture the console output but some how that didn't work out, so i 
> attached a screenshot of what happens when:
> - doing a xl pci-list for the guest
> - doing a xl pci-assignable-list

> - doing the xl pci-detach for 02:00.0

> - doing a xl pci-list for the guest
> - doing a xl pci-assignable-list

> - waiting some time ...

> - doing the xl pci-detach for 00:19.0

> - doing a xl pci-list for the guest
> - doing a xl pci-assignable-list

> There you can see this strange sequence of events :-)

> But i haven't been able to spot the culprit

Enabled some extra debugging and added some more printk's .. (see new 
screenshot)

From what it seems .. the frontend state for the first device isn't changed on 
the first pci-detach ..

Is the signaling on pci-detach the guests (pcifront) responsibility or the 
toolstacks (libxl) ?



> attached: screenshot.jpg

> --
> Sander



>> Thanks!
>>> 
>>> --
>>> Sander
>>> 

Attachment: screenshot2.jpg
Description: JPEG image

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.