[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit

On 2013-10-15 18:06, Wei Liu wrote:
On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote:
On 2013-10-15 16:43, Ian Campbell wrote:
On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
On 2013-10-14 19:19, Wei Liu wrote:
On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
Hi Ian,
    I meet the DomU's network interface hung issue recently, and have
been working on the issue from that time. I find that DomU's network
interface, which send lesser package, will hung if Dom0 running
32bit and DomU's up-time is very long.  I think that one jiffies
overflow bug exist in the function tx_credit_exceeded().
    I know the inline function time_after_eq(a,b) will process jiffies
overflow, but the function have one limit a should little that (b +
MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
    If DomU's network interface send lesser package (<0.5k/s if
jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
(credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
next_credit) will failure (should be true). So one timer which will
not be trigger in short time, and later process will be aborted when
timer_pending(&vif->credit_timeout) is true. The result will be
DomU's network interface will be hung in long time (> 40days).
    Please think about the below scenario:
      Dom0 running 32-bit and HZ = 1000
      vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
= 0xffffffff, vif->credit_usec=0 jiffies=0
      vif receive lesser package (DomU send lesser package). If the
value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
hours. jiffies will large than 0x7ffffff. we guess jiffies =
0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
one time which expire is 0xfffffff will be pended into system. So
the interface will hung until jiffies recount 0xffffffff (that will
need very long time).
If I'm not mistaken you meant time_after_eq(now, next_credit) in
netback. How does next_credit become 0xffffffff?
I only assume the value is 0xfffffff, and the value of next_credit
isn't  point. If the delta between now and next_credit larger than
ULONG_MAX, time_after_eq will do wrong judge.
So it sounds like we need a timer which is independent of the traffic
being sent to keep credit_timeout.expires rolling over.

Can you propose a patch?
Because credit_timeout.expire always after jiffies, i judge the
value over the range of time_after_eq() by time_before(now,
vif->credit_timeout.expires). please check the patch.
I don't think this really fix the issue for you. You still have chance
that now wraps around and falls between expires and next_credit. In that
case it's stalled again.

if time_before(now, vif->credit_timeout.expires) is true, time wrap and do operation. Otherwise time_before(now, vif->credit_timeout.expires) isn't true, now - vif->credit_timeout.expires should be letter than ULONG_MAX/2. Because next_credit large than vif->credit_timeout.expires (next_crdit = vif->credit_timeout.expires + msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and next_credit should be in range of time_after_eq(). So time_after_eq() do correctly judge.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.