Re: [Xen-devel] new netfront and occasional receive path lockup

 On 09/10/2010 04:50 AM, Pasi Kärkkäinen wrote:
> On Wed, Aug 25, 2010 at 08:51:09AM +0800, Xu, Dongxiao wrote:
>> Hi Christophe,
>>
>> Thanks for finding and checking the problem.
>> I will try to reproduce the issue and check what caused the problem.
>>
> Hello,
>
> Was this issue resolved? Some users have been complaining
> "network freezing up" issues recently on ##xen on irc..

Yeah, I'll add a command-line parameter to disable smartpoll (and leave
it off by default).

    J

> -- Pasi
>
>> Thanks,
>> Dongxiao
>>
>> Jeremy Fitzhardinge wrote:
>>>  On 08/22/2010 09:43 AM, Christophe Saout wrote:
>>>> Hi,
>>>>
>>>> I've been playing with some of the new pvops code, namely DomU guest
>>>> code.  What I've been observing on one of the virtual machines is
>>>> that 
>>>> the network (vif) is dying after about ten to sixty minutes of
>>>> uptime. 
>>>> The unfortunate thing here is that I can only repoduce it on a
>>>> production VM and have been unlucky so far to trigger the bug on a
>>>> test machine.  While this has not been tragic - rebooting fixed the
>>>> issue, unfortunately I can't spend very much time on debugging after
>>>> the issue pops up.
>>> Ah, OK.  I've seen this a couple of times as well.  And it just
>>> happened to me then... 
>>>
>>>
>>>> Now, what is happening is that the receive path goes dead.  The DomU
>>>> can send packets to Dom0 and those are visible using tcpdump on the
>>>> Dom0 on the virtual interface, but not the other way around.
>>> I hadn't got to that level of diagnosis, but I can confirm that
>>> that's what seems to be happening here too. 
>>>
>>>> Now, I have done more than one change at a time (I'd like to avoid
>>>> going into pinning it down since I can only reproduce it on a
>>>> production machine, as I said, so suggestions are welcome), but my
>>>> suspicion is that it might have to do with the new "smart polling"
>>>> feature in xen/netfront.  Note that I have also updated Dom0 to pull
>>>> in the latest dom0/backend and netback changes, just to make sure
>>>> it's 
>>>> not due to an issue that has been fixed there, but I'm still seeing
>>>> the same. 
>>> I agree.  I think I started seeing this once I merged smartpoll into
>>> netfront. 
>>>
>>>     J
>>>
>>>> The production machine is a machine that doesn't have much network
>>>> load, but deals with a lot of small network requests (DNS and smtp
>>>> mostly).  A workload which is hard to reproduce on the test machine.
>>>> Heavy network load (NFS, FTP and so on) for days hasn't triggered the
>>>> problem.  Also, segmentation offloading and similar settings don't
>>>> have any effect. 
>>>>
>>>> The machine has 2 physical and the VM 2 virtual CPUs, DomU has
>>>> PREEMPT 
>>>> enabled.
>>>>
>>>> I've been looking at the code, if there might be a race condition
>>>> somewhere, something like where one could run into a situation where
>>>> the hrtimer doesn't run and Dom0 believes the DomU should be polling
>>>> and doesn't emit an interrupt or something, but I'm afraid I don't
>>>> know enough to judge this (I mean, there are spinlocks which look
>>>> safe 
>>>> to me).
>>>>
>>>> Do you have any suggestions what to try?  I can trigger the issue on
>>>> the production VM again, but debugging should not take more than a
>>>> few 
>>>> minutes if it happens.  Access is only possible via the console.
>>>> Neither Dom0 nor the guest show anything unusual in the kernel
>>>> message 
>>>> and continue to behave normally after the network goes dead (also
>>>> able 
>>>> to shut down the guest normally).
>>>>
>>>> Thanks,
>>>>    Christophe
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] new netfront and occasional receive path lockup