WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Network drop to domU (netfront: rx->offset: 0, size: 4294967

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Network drop to domU (netfront: rx->offset: 0, size: 4294967295)
From: "PCextreme B.V. - Wido den Hollander" <wido@xxxxxxxxxxxx>
Date: Wed, 20 May 2009 12:52:50 +0200
Delivery-date: Wed, 20 May 2009 03:53:26 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hello,

I am the administrator of a fairly big Xen envirioment and i have run
into a bug.

At random points some dom0's loose their network connection for about 1
~ 2 minutes and in their kernel log the following comes up:

[548994.957487] printk: 56 messages suppressed.
[548994.957508] netfront: rx->offset: 0, size: 4294967295
[548994.957511] netfront: rx->offset: 0, size: 4294967295

The dom0 specs:

- 2x Intel(R) Xeon(R) CPU E5420
- 64GB DDR2 FB-DIMM
- 2x Intel 80003ES2LAN
- SuperMicro X7DB8 mainboard
- Areca ARC-1680ix-16 RAID Controller

This is a Ubuntu 8.04.2 system with Xen 3.2.1-rc1-pre installed from the
Ubuntu repositories.

The kernel used here is a customized kernel (2.6.24-24-xen) based on the
Ubuntu source, NR_DYNIRQS has been raised from 256 to 1024 to support
more domU's.

At the moment this server is hosting about 110 domU's.

In my "xm dmesg" i get the following messages:
(XEN) grant_table.c:1262:d0 Bad flags (0) or dom (0). (expected dom 0)

This message is reported about 1000 times in a few days.

I have two of these machines running, they are identical in both
software in hardware, the only difference is the fact that one server
hosts 110 domU's and the other hosts about 20 domU's.

This behaviour is only seen the the machine hosting the 110 domU's.

At first i thought this had to do something with my Intel NIC, but at
the moment the domU becomes unavailable the dom0 is still available, so
it seems to go wrong somewhere inside the netfront. (That is what Google
told me).

One of the tests i did was disabling TSO, RX en TX with ethtool in both
the dom0 and the domU, but this did not have any effect, the messages
keep coming.

To me this issue seems related to the large number of domU's running on
this system, especially since the other identical machine is not
effected.

I took the kernel source and started looking where the netfront messages
was being printed and it seemed some kind of memory allocation issue? I
have found some old messages with patches but those did not apply to my
current source.

Since this is a running production system i can schedule a reboot for a
new kernel, but this takes some time.

-  
Met vriendelijke groet,

Wido den Hollander
Hoofd Systeembeheer / CSO
Telefoon Support Nederland: 0900 9633 (45 cpm)
Telefoon Support België: 0900 70312 (45 cpm)
Telefoon Direct: (+31) (0)20 50 60 104
Fax: +31 (0)20 50 60 111
E-mail: support@xxxxxxxxxxxx
Website: http://www.pcextreme.nl
Kennisbank: http://support.pcextreme.nl/
Netwerkstatus: http://nmc.pcextreme.nl

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel