[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] NFS related netback hang



On Sat, Apr 13, 2013 at 3:06 PM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Apr 13, 2013 at 2:19 PM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> 
> wrote:
>>>> I still believe the key factor is to stress the memory.
>>>> Maybe you can try further limit the memory size and use a larger file size.
>>>>
>>>> I become uncertain about how the transfer speed affects.
>>>> I can achieve 10GB/s in iperf test without issue.
>>>> And ftp transfer also works without problem at 50MB/s
>>>> But may be the higher net speed is a negative factor here -- NFS may
>>>> be able to commit changes in faster speed.
>>>> Probably we should feed data faster than NFS can handle so that memory
>>>> is used up quickly?
>>>> But the back pressure from down stream should slow down the speed that
>>>> upstream is eating the memory.
>>>> How does the throttling works? Anyway to control?
>>>>
>>>> I'll check why my dom0 reported OOM, may be that's one factor too.
>>>>
>>>
>>> This is a good starting point. :-)
>>>
>>
>> It seems that the OOM-killer was only triggered on kernel version 3.6.9.
>> It does not show up in 3.6.11 while the issue still exists.
>> So I guess there are some changes in mm behavior in recent kernels.
>> Probably I should try with your kernel version.
>>
>> But anyway, let's see some existing data first.
>> Please find the full oom_kill log in the attached file.
>> It seems that OOM_KILL is caused by the freeze issue, since the NFS
>> server (domU) becomes unresponsive first.
>> There are about 900MB dirty pages. (writeback:152893 unstable:72860,
>> can they be simply added?)
>> I don't remember the totoal memory at that time due to the ballooning.
>> The total DRAM in the host is 8GB.

Hi Wei,
I think I find something important for this issue.
I moved to a brand new kernel version -- 3.8.7 yesterday.
Both your configuration and mine are tried.
And the configuration really matters -- mine configuration fails while
yours does not.

I made a quick comparison and found some key difference -- kernel preemption.
I enabled the kernel preemption and use 1000 Hz time slices.
I just have my config attached, may be there are something else that
matters. You can simply check it out.

To fix the issue, I wonder if it's feasible to reserve some pages for
low memory situation?
Also, the performance drop upon large file transfer does not make sense.
With raw network speed at about 7Gbps, why we end up with about 30MB/s
with large files?
Current HDD should be able to sustain 100MB/s in sequential writes.

Thanks,
Timothy

Attachment: config.i7
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.