[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))



On 11/23/2010 03:51 AM, Ian Campbell wrote:
> I'm not sure but looking at the complete bootlog it looks as if the
> system may only have node==1 i.e. no 0 node which could plausibly lead
> to this sort of issue:
>         [    0.000000] Bootmem setup node 1 0000000000000000-0000000040000000
>         [    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
>         [    0.000000]   bootmap [0000000000010000 -  0000000000017fff] pages 
> 8
>         [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 
> 0040000000]
>         [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> 
> [0000000000 - 0000001000]
>         [    0.000000]   #1 [0003446000 - 0003465000]   XEN PAGETABLES ==> 
> [0003446000 - 0003465000]
>         [    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> 
> [0000006000 - 0000008000]
>         [    0.000000]   #3 [0001000000 - 0001694994]    TEXT DATA BSS ==> 
> [0001000000 - 0001694994]
>         [    0.000000]   #4 [00016b5000 - 0003244e00]          RAMDISK ==> 
> [00016b5000 - 0003244e00]
>         [    0.000000]   #5 [0003245000 - 0003446000]   XEN START INFO ==> 
> [0003245000 - 0003446000]
>         [    0.000000]   #6 [0001695000 - 000169532d]              BRK ==> 
> [0001695000 - 000169532d]
>         [    0.000000]   #7 [0000100000 - 00002e0000]          PGTABLE ==> 
> [0000100000 - 00002e0000]
>         [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
>         [    0.000000] Zone PFN ranges:
>         [    0.000000]   DMA      0x00000000 -> 0x00001000
>         [    0.000000]   DMA32    0x00001000 -> 0x00100000
>         [    0.000000]   Normal   0x00100000 -> 0x00100000
>         [    0.000000] Movable zone start PFN for each node
>         [    0.000000] early_node_map[2] active PFN ranges
>         [    0.000000]     1: 0x00000000 -> 0x000000a0
>         [    0.000000]     1: 0x00000100 -> 0x00040000
>         [    0.000000] On node 1 totalpages: 262048
>         [    0.000000]   DMA zone: 56 pages used for memmap
>         [    0.000000]   DMA zone: 483 pages reserved
>         [    0.000000]   DMA zone: 3461 pages, LIFO batch:0
>         [    0.000000]   DMA32 zone: 3528 pages used for memmap
>         [    0.000000]   DMA32 zone: 254520 pages, LIFO batch:31
>
> Perhaps we should be passing numa_node_id() (e.g. current node) instead
> of node 0? There doesn't seem to be another obvious alternative to
> passing in an explicit node number to this callchain (some places cope
> with -1 but not this path AFAICT).

Does booting native get the same configuration?

> It's also not obvious if dom0 should be seeing the tables which describe
> the hosts nodes anyway or if we should be clobbering something. Given
> that dom0 sees a pseudo-physical address map I'm not convinced seeing
> the real SRAT is in any way beneficial. Perhaps we should simply be
> clobbering NUMAness until actual PV understanding of NUMA is ready?

Yes, the host SRAT is meaningless in the domain and we really should
ignore it.  I'm not sure what happens if you boot on a really NUMA system.

> One thing I notice when googling R410 issues is that they apparently
> have a "Cores per CPU" BIOS option which might be worth playing with,
> since configuring a reduced number of cores might remove node 0 but not
> node 1 (odd but not invalid?). Presumably it is also worth making sure
> you have the latest BIOS etc.

Also, what's the DIMM configuration?  Are the slots fully populated?


    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.