[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PCI BAR register space written with garbage in HVM guest.



On Mon, Mar 15, 2010 at 10:09:28PM -0300, Dan Gora wrote:
> Hi All,
> 
> I'm having a problem where if I pass through two instances of my
> device to a HVM domU, one of the board instances is having it's PCI
> BAR registers overwritten with garbage by some unknown actor 30
> seconds to a minute after I load my driver.  I cannnot for the life of
> me find what might possibly be overwriting the BAR registers.
> 
> I've added a debugging printf to XEN in
> xen/arch/x86/pci.c:pci_conf_write() and I can see the entire PCI BAR
> address space being overwritten with garbage:
> 
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080000 offset=0x0
> bytes=4 value=0xffffffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080004 offset=0x0
> bytes=4 value=0x1600ffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080008 offset=0x0
> bytes=4 value=0x64d5323e
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008000c offset=0x0
> bytes=4 value=0x450008
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080010 offset=0x0
> bytes=4 value=0xa7e54002
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080014 offset=0x0
> bytes=4 value=0x11400000

Wow.. That is impressive. Are the values always the same? Or are they
truly random?

> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080018 offset=0x0
> bytes=4 value=0x693
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008001c offset=0x0
> bytes=4 value=0xffff0000
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080020 offset=0x0
> bytes=4 value=0x4400ffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080024 offset=0x0
> bytes=4 value=0x2c024300
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080028 offset=0x0
> bytes=4 value=0x1012dac
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008002c offset=0x0
> bytes=4 value=0xa1c30006
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080030 offset=0x0
> bytes=4 value=0xa00040d
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080034 offset=0x0
> bytes=4 value=0x0
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080038 offset=0x0
> bytes=4 value=0x0
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008003c offset=0x0
> bytes=4 value=0x0
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080040 offset=0x0
> bytes=4 value=0x0
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080044 offset=0x0
> bytes=4 value=0x16000000
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080048 offset=0x0
> bytes=4 value=0x64d5323e
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008004c offset=0x0
> bytes=4 value=0x0
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080050 offset=0x0
> bytes=4 value=0x0
> <snipped, rest of PCI BAR registers written with 0x0...>
> 
> I've added printks to the dom0 and domU kernels in the
> pci_bus_write_config_##size() macros in drivers/pci/access.c and in
> arch/x86/pci/direct.c to print every time the kernel accesses PCI
> configuration space, but I only see these printfs when my driver
> access my board's PCI configuration space or some other driver
> accesses PCI configuration space, but I do NOT see them when this PCI
> BAR register space trashing happens.
> 
> So I noticed also that lspci does not cause these kernel printfs to
> occur and upon reading the pciutils source code I learned that pretty
> much anything which can do an outl() to 0xcf8/0xcfc can mess with PCI
> configuration space.
> 
> So now I figure it must be some user space thing unless I'm just
> missing some other way which the kernel or XEN can access PCI
> configuration space, but what could it possibly be?
> 
> This problem only occurs in HVM guests and only seems to occur when I
> pass two instances of my device to the domU and only occurs many many
> seconds after I load my driver (30-60 seconds).  I'm absolutely sure
> that it's not my driver because the kernel printfs show up when my
> driver accesses PCI configuration space.
> 
> I'm really pretty much at a loss as even how to debug this.  There
> doesn't appear to be any dump_stack() in XEN so that I can see what
> called pci_conf_write() in XEN, but even then it appears that it only
> gets called as a trap from the dom0 or domU.  It's not clear to me if
> you can even see what process/stack actually caused the trap back in
> the dom0 or domU.  Is that possible?

You could instrument the code (Xen) to crash the DomU domain when you
detect garbage. Then you can pick at with xenctx to look at its stack..etc
> 
> Is there anything else that I should look at?  qemu?  pciback?
> pcifront?  Am I missing some access method to PCI configuration space
> down in the kernel or is pci_confl_read/write pretty much it?  Any

QEMU uses libpci, which is the same as lspci, and that looks to work.

You can crank up the verbosity of pciback and pcifront with its
parameters to see if they are the ones doing this. But your domain is
HVM DomU so the pcifront/pciback is not utilized.

That narrows it down to QEMU or the Dom0 kernel.

> ideas what would possibly be trying to overwrite all of PCI
> configuration space like this?
> 
> _any_ ideas are most welcome..
> 
> thanks
> dan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.