Nivedita Singhvi <niv@xxxxxxxxxx> wrote:
I don't have boxes at the moment and can't reproduce till
Monday, but can you show us the output of netstat -uan and
netstat -s on both domains? Is there stuff in the receive
or send queues?
The detailed output of netstat follows. But their is neither anything in
the send queue on domU, nor anything in the receive queue on dom0. (The
UDP server in question is running on port 2000.)
On dom0:
$ netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:1024 0.0.0.0:*
udp 0 0 0.0.0.0:2049 0.0.0.0:*
udp 0 0 0.0.0.0:514 0.0.0.0:*
udp 0 0 0.0.0.0:1027 0.0.0.0:*
udp 0 0 155.98.36.34:1028 155.98.32.70:8509 ESTABLISHED
udp 0 0 0.0.0.0:775 0.0.0.0:*
udp 0 0 0.0.0.0:653 0.0.0.0:*
udp 0 0 192.168.0.1:2000 192.168.1.1:1024 ESTABLISHED
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 0.0.0.0:111 0.0.0.0:*
udp 0 0 0.0.0.0:759 0.0.0.0:*
On domU:
# netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 192.168.1.1:1024 192.168.0.1:2000 ESTABLISHED
The netstat -s output is a bit long, so I've attached those, instead of
including them inline.
And was all the udp traffic going to the same port? i.e. any successful
udp traffic to another endpoint?
All the traffic was going to port 2000. Trying to send UDP traffic from
domU to a different port in dom0 (after the networking failure) does not
succeed. (If you're asking if traffic could be sent to multiple ports
while the networking is functional, I believe the answer is yes, but would
double check.)
What does ifconfig on dom0 show?
Are there any error messages in /var/log/messages?
$ ifconfig vif1.0
vif1.0 Link encap:Ethernet HWaddr AA:00:01:7B:92:C2
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:134 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5884 (5.7 Kb) TX bytes:676 (676.0 b)
$ sudo tail /var/log/messages
Jan 16 19:34:09 node1 ntpd[993]: kernel time sync disabled 0041
Jan 16 19:35:15 node1 ntpd[993]: kernel time sync enabled 0001
Jan 16 19:39:29 node1 ntpd[993]: synchronized to 155.98.33.74, stratum=2
Jan 16 19:49:07 node1 ntpd[993]: time correction of -18001 seconds exceeds
sanity limit (1000); set clock manually to the correct UTC time.
Jan 16 19:59:15 node1 sshd(pam_unix)[1457]: session opened for user mukesh
by (uid=30245)
Jan 16 19:59:18 node1 sshd(pam_unix)[1486]: session opened for user mukesh
by (uid=30245)
Jan 16 19:59:30 node1 sshd(pam_unix)[1517]: session opened for user mukesh
by (uid=30245)
Jan 16 20:09:29 node1 modprobe: modprobe: Can't open dependencies file
/lib/modules/2.4.27-xen0/modules.dep (No such file or directory)
Jan 16 20:09:44 node1 last message repeated 2 times
Jan 16 20:16:02 node1 kernel: device vif1.0 entered promiscuous mode
Looking at the interrupt counts in /proc/interrupts, I see that D0 no
longer receives packets sent by D1. D1, however, does receive packets
sent by D0. (To be clear, D0->D1 traffic is ICMP ping requests,
unrelated to the UDP traffic. There is not UDP traffic sent from D0 to D1.)
Is there any other successful traffic from D0 -> D1 (tcp?)
Any traffic is successful from D0->D1, even after the network stops
working. This includes ICMP, UDP, and TCP. (Sorry if my comment about
"There is not UDP traffic sent from D0 to D1" was confusing. What I meant
was that I wasn't sending and UDP traffic from D0 to D1. Not that such
traffic fails.)
This is subject to the limitation mentioned in my first message. Namely,
that dom0's ARP cache entry for domU eventually times out. At that point,
dom0 attempts to ARP for domU's MAC. domU sees this, and replies (as seen
by tcpdump on domU). But dom0 never gets the ARP replies, so eventually
D0->D1 traffic fails as well. (E.g. "telnet 192.168.1.1" returns "No route
to host".)
Also, let me add some more detail to my original report:
1. The networking fails after the 128th UDP packet received in dom0, even
if I restart domU. Specifically:
- If I send one UDP packet from domU to dom0, shut down domU, and
start a fresh domU, then I can only send 127 (rather than
128) UDP packets from the new domU before networking will fail.
- If I shut down domU after the networking failure, and start a
new domU, networking between the new domU and dom0 does not
work.
2. The server run in dom0 is
nc -l -u -p 2000
3. The traffic generator run in domU is
i=0; while true; do
((++i)); echo $i
echo $i | nc -u -w 1 192.168.0.1 2000
done &
thanks,
mukesh
netstat-dom0.txt
Description: netstat -s for domain0
netstat-domU.txt
Description: netstat -s for domain1
|