Xen project Mailing List

Re: [Xen-devel] Problem with PV disk and iSCSI

To: Gary Grebus <ggrebus@xxxxxxxxxxxxxxx>

From: Kurt Hackel <kurt.hackel@xxxxxxxxxx>

Date: Fri, 8 Feb 2008 22:15:48 -0800

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 08 Feb 2008 22:16:54 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Gary, On Fri, Feb 08, 2008 at 02:54:14PM -0500, Gary Grebus wrote: > I've run into a problem on 3.1.2 with an HVM guest using PV disks. In > dom0, the physical disk is accessed using iSCSI. The symptom is that > applications in dom0 which are monitoring the iSCSI network interface > (e.g. tcpdump) die with EFAULT errors. > > When the block I/O completes, it looks like blkback is doing a > GNTTABOP_unmap_grant_ref on a guest page, even though the dom0 kernel > has done get_page() on it and still holds references. > > The page had been passed through iSCSI into the network stack, so it > ends up referenced by one or more skb's. Because there was an AF_PACKET > socket open, a clone of the skb ends up queued for an indeterminate > amount on that socket queue. When the application finally gets around > to reading the data, the page is no longer mapped, and the read fails > trying to copy the data out of the kernel. > > Has anyone else seen anything similar? I mentioned tcpdump, but the > problem also shows up with dhcpcd, which needs to process packets at the > ethernet layer. > We're seeing the same thing with 3.1.3. When running iscsi in dom0 (over a xen bridge) presenting these via blkfront to the guest we see the same crash (below) while performing failover tests on the storage controller. Just as you said, the error occurs in skb_remove_foreign_references from loopback_start_xmit. It's running all the foreign pages, attempting to copy each locally when it dies on the source address (esi) of the following memcpy: 115 vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[i]); 116 off = skb_shinfo(skb)->frags[i].page_offset; 117 memcpy(page_address(page) + off, 118 vaddr + off, 119 skb_shinfo(skb)->frags[i].size); c053f2f7: 0f b7 74 c8 18 movzwl 0x18(%eax,%ecx,8),%esi c053f2fc: 0f b7 5c c8 1a movzwl 0x1a(%eax,%ecx,8),%ebx c053f301: 8b 44 24 0c mov 0xc(%esp),%eax c053f305: e8 ba 09 f1 ff call 0xc044fcc4 page_address c053f30a: 89 d9 mov %ebx,%ecx c053f30c: c1 e9 02 shr $0x2,%ecx c053f30f: 8d 3c 30 lea (%eax,%esi,1),%edi c053f312: 03 74 24 04 add 0x4(%esp),%esi c053f316: f3 a5 rep movsl %ds:(%esi),%es:(%edi) <<<<< memcpy ds: 007b esi: c0df7000 es: 007b edi: ebffb000 It seems one of the skb->frags has been unmapped. > I'm thinking blkback will have to make a dom0 copy of the page before > doing the unmap if there are still extra references? > Can the unmap be deferred, handled by the last reference holder? Or does this open up a potential security hole? Thanks kurt Kurt Hackel Oracle Corp. =========================================== BUG: unable to handle kernel paging request at virtual address c0df7000 printing eip: c053f316 36d4c000 -> *pde = 00000000:c4237027 36c37000 -> *pme = 00000001:1bd14067 00d14000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP Modules linked in: xt_physdev bridge autofs4 sunrpc dm_round_robin ip_conntrack_netbios_ns ipt_REJECT xt_tcpudp xt_state ip_conntrack nfnetlink iptable_filter ip_tables x_tables ib_iser rdma_cm ib_addr ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi scsi_transport_iscsi ocfs2(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs dm_mirror dm_multipath dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev sg i2c_piix4 i2c_core pcspkr k8_edac edac_mc tg3 ide_cd serio_raw serial_core cdrom qla2xxx scsi_transport_fc sata_svw libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 3 EIP: 0061:[<c053f316>] Not tainted VLI EFLAGS: 00010286 (2.6.18-8.1.6.0.18.el5xen #1) EIP is at loopback_start_xmit+0x107/0x2bd eax: ebffb000 ebx: 00000578 ecx: 0000015e edx: c065c800 esi: c0df7000 edi: ebffb000 ebp: f1134ea8 esp: c0701e6c ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, ti=c0701000 task=f77c05a0 task.ti=c0d2f000) Stack: c9a13c00 c0df7000 00000001 c157ff60 c9a13800 f1134ea8 c9a13980 c9a13800 c059fc02 c9a13800 f1134ea8 c9a13980 0000000e c05a1768 c0dcf824 c0dcf800 f1134ea8 c05a5cfc c9a13800 ed20e040 00001fc2 00000000 f48703d4 f48703e8 Call Trace: [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee [<c05a1768>] dev_queue_xmit+0x24c/0x2e8 [<c05a5cfc>] neigh_resolve_output+0x1b7/0x1e1 [<c05bea8b>] ip_output+0x1c0/0x1f9 [<c05be309>] ip_queue_xmit+0x390/0x3cf [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee [<c05adbe6>] __qdisc_run+0x30/0x19a [<c05a17e6>] dev_queue_xmit+0x2ca/0x2e8 [<f8640d48>] br_dev_queue_push_xmit+0x15b/0x17e [bridge] [<c05cbc6f>] tcp_transmit_skb+0x5e4/0x612 [<f8641945>] br_handle_frame+0x146/0x15d [bridge] [<c05cc9ad>] tcp_retransmit_skb+0x4b7/0x595 [<c05c5baf>] tcp_enter_loss+0x1a2/0x1ff [<c05cee58>] tcp_write_timer+0x3ff/0x5d3 [<c05cea59>] tcp_write_timer+0x0/0x5d3 [<c0427146>] run_timer_softirq+0x120/0x19b [<c0423162>] __do_softirq+0x73/0xe8 [<c0406dda>] do_softirq+0x6e/0x102 ======================= [<c0406d63>] do_IRQ+0xa5/0xae [<c052f040>] evtchn_do_upcall+0x85/0xde [<c04056a1>] hypervisor_callback+0x3d/0x45 [<c040800e>] raw_safe_halt+0xc2/0xe8 [<c040442a>] xen_idle+0x43/0x4f [<c04033b0>] cpu_idle+0xa1/0xbb Code: 24 08 89 44 24 04 8b 85 a4 00 00 00 0f b7 74 c8 18 0f b7 5c c8 1a 8b 44 24 0c e8 ba 09 f1 ff 89 d9 c1 e9 02 8d 3c 30 03 74 24 04 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 04 ba 05 00 00 00 e8 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.