See my response in red.
Sent: Thursday, November 11, 2010 11:04 AM
To: Dante Cinco
Subject: Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
On Thu, Nov 11, 2010 at 10:31:48AM -0800, Dante Cinco wrote:
> Without swiotlb=force, I don't see "PCI-DMA: Using software bounce
> buffering for IO" in /var/log/kern.log.
> With iommu=soft and without swiotlb=force, I see the "software bounce
> buffering" in /var/log/kern.log and an NMI (see below) when I load the
> kernel module drivers. I made sure the NMI is reproducible and not a
What is the kernel module doing to cause this? DMA?
> one-time event.
So doing 64-bit DMA causes an NMI. Do you have the Hypervisor's IOMMU VT-d enabled or disabled? (iommu=off,verbose) If you turn it off does this work?
We have IOMMU VT-d enabled. If we turn it off (iommu=off,verbose), the DMA doesn't work properly and the driver code is unable to detect the source of interrupt. The interrupts of our device would be disabled by kernel eventually
due to nobody services the interrupts for more than 100000 times.
124: 86538 0 0 0 0 0 13462 0 0 0 0 0 0 0 xen-pirq-pcifront-msi HW_TACHYON
125: 88348 0 0 0 11652 0 0 0 0 0 0 0 0 0 xen-pirq-pcifront-msi HW_TACHYON
126: 89335 0 10665 0 0 0 0 0 0 0 0 0 0 0 xen-pirq-pcifront-msi HW_TACHYON
127: 100000 0 0 0 0 0 0 0 0 0 0 0 0 0 xen-pirq-pcifront-msi HW_TACHYON
> /var/log/kern.log (iommu=soft):
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Placing 64MB
> software IO TLB between ffff880005800000 - ffff880009800000 software
> IO TLB at phys 0x5800000 - 0x9800000
> (XEN) NMI - I/O ERROR
> (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
> (XEN) RFLAGS: 0000000000000012 CONTEXT: hypervisor
> (XEN) rax: 0000000000000080 rbx: ffff82c480287c48 rcx: 0000000000000000
> (XEN) rdx: 0000000000000080 rsi: 0000000000000080 rdi: ffff82c480287c48
> (XEN) rbp: ffff82c480287c78 rsp: ffff82c480287c38 r8: 0000000000000000
> (XEN) r9: 0000000000000037 r10: 0000ffff0000ffff r11: 00ff00ff00ff00ff
> (XEN) r12: ffff82c48029f080 r13: 0000000000000001 r14: 0000000000000008
> (XEN) r15: ffff82c4802b0c20 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 00000001250a9000 cr2: 00007f6165ae9428
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c480287c38:
> (XEN) ffff82c480287c78 ffff82c48012001f 0000000000000100 0000000000000000
> (XEN) ffff82c480287ca8 ffff83011dadd8b0 ffff83019fffa9d0 ffff82c4802c2300
> (XEN) ffff82c480287cc8 ffff82c480117d0d ffff82c48029f080 0000000000000001
> (XEN) 0000000000000100 0000000000000000 0000000000000002 ffff8300df606000
> (XEN) 000000411de66867 ffff82c4802c2300 ffff82c480287d28 ffff82c48011f299
> (XEN) 0000000000000100 0000000000000086 ffff83019e3fa000 ffff83011dadd8b0
> (XEN) ffff83019fffa9d0 ffff8300df606000 0000000000000000 0000000000000000
> (XEN) 000000000000007f ffff83019fe02200 ffff82c480287d38 ffff82c48011f6ea
> (XEN) ffff82c480287d58 ffff82c48014e4c1 ffff83011dae2000 0000000000000066
> (XEN) ffff82c480287d68 ffff82c48014e54d ffff82c480287d98 ffff82c480105d59
> (XEN) ffff82c480287da8 ffff8301616a6990 ffff83011dae2000 0000000000000000
> (XEN) ffff82c480287da8 ffff82c480105f81 ffff82c480287e28 ffff82c48015c043
> (XEN) 0000000000000043 0000000000000043 ffff83019fe02234 0000000000000000
> (XEN) 000000000000010c 0000000000000000 0000000000000000 0000000000000002
> (XEN) ffff82c480287e10 ffff82c480287f18 ffff82c48024f6c0 ffff82c480287f18
> (XEN) ffff82c4802c2300 0000000000000002 00007d3b7fd781a7 ffff82c480154ee6
> (XEN) 0000000000000002 ffff82c4802c2300 ffff82c480287f18 ffff82c48024f6c0
> (XEN) ffff82c480287ee0 ffff82c480287f18 00ff00ff00ff00ff 0000ffff0000ffff
> (XEN) 0000000000000000 0000000000000000 ffff82c4802c23a0 0000000000000000
> (XEN) 0000000000000000 ffff82c4802c2e80 0000000000000000 0000007a00000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
> (XEN) [<ffff82c480117d0d>] csched_vcpu_wake+0x2e1/0x302
> (XEN) [<ffff82c48011f299>] vcpu_wake+0x243/0x43e
> (XEN) [<ffff82c48011f6ea>] vcpu_unblock+0x4a/0x4c
> (XEN) [<ffff82c48014e4c1>] vcpu_kick+0x21/0x7f
> (XEN) [<ffff82c48014e54d>] vcpu_mark_events_pending+0x2e/0x32
> (XEN) [<ffff82c480105d59>] evtchn_set_pending+0xbf/0x190
> (XEN) [<ffff82c480105f81>] send_guest_pirq+0x54/0x56
> (XEN) [<ffff82c48015c043>] do_IRQ+0x3b2/0x59c
> (XEN) [<ffff82c480154ee6>] common_interrupt+0x26/0x30
> (XEN) [<ffff82c48014e3c3>] default_idle+0x82/0x87
> (XEN) [<ffff82c480150664>] idle_loop+0x5a/0x68
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL TRAP: vector = 2 (nmi)
> (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> (XEN) ****************************************
> (XEN) Reboot in five seconds...
> On Thu, Nov 11, 2010 at 8:04 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
> > On Wed, Nov 10, 2010 at 05:16:14PM -0800, Dante Cinco wrote:
> >> We have Fibre Channel HBA devices that we PCI passthrough to our
> >> pvops domU kernel. Without swiotlb=force in the domU's kernel
> >> command line, both domU and dom0 lock up after loading the kernel
> >> module drivers for the HBA devices. With swiotlb=force, the domU
> >> and dom0 are stable
> > Whoa. That is not good - what happens if you just pass in iommu=soft?
> > Does the PCI-DMA: Using.. show up if you don't pass in any of those parameters?
> > (I don't think it does, but just doing 'iommu=soft' should enable it).
> >> after loading the kernel module drivers but the I/O performance is
> >> at least an order of magnitude worse than what we were seeing with
> >> the HVM kernel. I see the following in /var/log/kern.log in the
> >> pvops
> >> domU:
> >> PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Placing
> >> 64MB software IO TLB between ffff880005800000 - ffff880009800000
> >> software IO TLB at phys 0x5800000 - 0x9800000
> >> Is swiotlb=force responsible for the I/O performance degradation? I
> >> don't understand what swiotlb=force does so I would appreciate an
> >> explanation or a pointer.
> > So, you should only need to use 'iommu=soft'. It will enable the
> > Linux kernel IOMMU to translate the pseudo-PFNs to the real machine frame numbers (bus addresses).
> > If your card is 64-bit, then that is all it would do. If however
> > your card is 32-bit and your are DMA-ing data from above the 32-bit
> > limit, it would copy the user-space page to memory below 4GB, DMA
> > that, and when done, copy it back to the where the user-space page
> > is. This is called bounce-buffering and this is why you would use a mix of pci_map_page, pci_sync_single_for_[cpu|device] calls around your driver.
> > However, I think your cards are 64-bit, so you don't need this
> > bounce-buffering. But if you say 'swiotlb=force' it will force _all_ DMAs to go through the bounce-buffer.
> > So, try just 'iommu=soft' and see what happens.
Xen-devel mailing list