[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429



steved@xxxxxxxxxx wrote on 09/05/2006 07:56:00 PM:

> I'm running 64-bit SLES 10 beta 10 (yes, we have to upgrade to the
> official release) on a machine with four Xeon 7020s.  I got xen-
> unstable changeset 11429:66dd34f2f439 and built 64-bit uniprocessor
> kernels for dom0 and the HVM domain (a 2.6.16.13 baremetal kernel
> and its initrd).  The HVM domain is also running SLES 10 beta 10.  I
> followed the instructions to build the paravirtualized drivers for
> an HVM domain.  When I run "modprobe xen_platform_pci" in the HVM
> domain I get a kernel oops.  Here is the output in dmesg.
>
> PCI: Found IRQ 10 for device 0000:00:03.0
> Xen version 3.0.
> Hypercall area is 1 pages (order 0 allocation)
> Unable to handle kernel paging request at ffff81002aca5220 RIP:
> [<ffff81002aca5220>]
> PGD 8063 PUD 9063 PMD 800000002ac001e3 PTE 31e031e031e031e
> Oops: 0011 [1]
> CPU 0
> Modules linked in: xen_platform_pci ext3 mbcache jbd edd processor
> lpfc mptspi mptscsih mptbase ata_
> piix libata
> Pid: 4000, comm: modprobe Not tainted 2.6.16.13-baremetal-up #1
> RIP: 0010:[<ffff81002aca5220>] [<ffff81002aca5220>]
> RSP: 0018:ffff8100265b5b60  EFLAGS: 00010282
> RAX: ffff81002aca5220 RBX: 000000002aca5000 RCX: 0000000040000000
> RDX: 0000000000000000 RSI: ffff8100265b5b68 RDI: 0000000000000006
> RBP: ffff8100265b5b78 R08: ffff81002aca5000 R09: ffffffff7fffffff
> R10: 00007f0000000000 R11: 0000000080000000 R12: ffff81002fea8000
> R13: 00000000f3000000 R14: 000000000000c100 R15: 0000000000000001
> FS:  00002b443d7726d0(0000) GS:ffffffff80533000(0000)
knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffff81002aca5220 CR3: 0000000026f89000 CR4: 00000000000006e0
> Process modprobe (pid: 4000, threadinfo ffff8100265b4000, task
> ffff81002fba0380)
> Stack: ffffffff88086c5c ffff810000000000 ffffffff80146693
ffff8100265b5c08
>        ffffffff88086635 0000000300000000 ffff8100265b5bb8
0000000000000000
>        0000000000000100 0000000001000000
> Call Trace: <ffffffff88086c5c>{:xen_platform_pci:setup_xen_features+40}
>        <ffffffff80146693>{__get_free_pages+49} <ffffffff88086635>{:
> xen_platform_pci:platform_pci_init+832}
>        <ffffffff80207ef2>{pci_device_probe+77}
> <ffffffff8024d32a>{driver_probe_device+92}
>        <ffffffff8024d3f2>{__driver_attach+0}
> <ffffffff8024d449>{__driver_attach+87}
>        <ffffffff8024cd16>{bus_for_each_dev+79}
> <ffffffff8024d25a>{driver_attach+28}
>        <ffffffff8024c913>{bus_add_driver+122}
> <ffffffff8024d6d4>{driver_register+143}
>        <ffffffff802080b1>{__pci_register_driver+111}
> <ffffffff8808e01c>{:xen_platform_pci:platform_pci_module_init+28}
>        <ffffffff8013daa5>{sys_init_module+5606}
> <ffffffff8013731f>{autoremove_wake_function+0}
>        <ffffffff8015efaa>{vfs_read+173}
<ffffffff8010a8ba>{system_call+126}
>
> Code: b8 11 00 00 00 0f 01 c1 c3 00 00 00 00 00 00 00 00 00 00 00
> RIP [<ffff81002aca5220>] RSP <ffff8100265b5b60>
> CR2: ffff81002aca5220
>
> It is oopsing on line 25 in unmodified_drivers/linux-2.6/platform-
> pci/features.c (which is a sym link to ../../linux-2.6-xen-
> sparse/drivers/xen/core/features.c):
> if (HYPERVISOR_xen_version(XENVER_get_features, &fi) < 0)
>
> Looks like something went wrong with the hypercall.  I crawled
> through the code to see how the hypercall stubs are set up but got
> lost in the MSR stuff.  I'll take a look at it again tomorrow.
> Thought I should post it to the list in case anyone else can
> reproduce the problem and either find a fix or explain why it's a user
error.
>
> Let me know if you need more info on my setup.
>
> Steve D.

Digging into this further I found that the problem is that they hypercall
mechanism its trying to execute the instructions for the hypercall which
reside in the hypercall stubs page.  However, the page table entry for the
page has the _PAGE_NX (no execute) bit set.  (I'm running a 64-bit OS with
PAE in the HVM domain.)  The error code in the oops (0x11) indicates that
the page fault is because of the _PAGE_NX bit.  0x01 -> access rights
violation  0x10 -> The fault was caused by an instruction fetch.

I tried hacking some code to turn off the NX bit in the PTE for the
hypercall stubs page, but I still get the oops.  I'm thinking it's because
the NX bit is set in the PMD.

I'm quite new to the paging mechanism, so I'm not sure how to fix this at
the moment.   I'll keep poking around.  thought I'd share my findings so
far.

Steve D.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.