[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] NFS related netback hang
On Sat, Apr 13, 2013 at 2:19 PM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote: >>> I still believe the key factor is to stress the memory. >>> Maybe you can try further limit the memory size and use a larger file size. >>> >>> I become uncertain about how the transfer speed affects. >>> I can achieve 10GB/s in iperf test without issue. >>> And ftp transfer also works without problem at 50MB/s >>> But may be the higher net speed is a negative factor here -- NFS may >>> be able to commit changes in faster speed. >>> Probably we should feed data faster than NFS can handle so that memory >>> is used up quickly? >>> But the back pressure from down stream should slow down the speed that >>> upstream is eating the memory. >>> How does the throttling works? Anyway to control? >>> >>> I'll check why my dom0 reported OOM, may be that's one factor too. >>> >> >> This is a good starting point. :-) >> > > It seems that the OOM-killer was only triggered on kernel version 3.6.9. > It does not show up in 3.6.11 while the issue still exists. > So I guess there are some changes in mm behavior in recent kernels. > Probably I should try with your kernel version. > > But anyway, let's see some existing data first. > Please find the full oom_kill log in the attached file. > It seems that OOM_KILL is caused by the freeze issue, since the NFS > server (domU) becomes unresponsive first. > There are about 900MB dirty pages. (writeback:152893 unstable:72860, > can they be simply added?) > I don't remember the totoal memory at that time due to the ballooning. > The total DRAM in the host is 8GB. > > Thanks, > Timothy I did another experiment by dumping the live statistics during reproducing. The command line used: dd if=/dev/zero of=foobar bs=1M count=4096 Statistics: Output from vmstat -m: while true; do sleep 1; date;vmstat -m|grep -i nfs; done Sat Apr 13 13:35:31 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 36 36 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 16 16 1008 4 nfs_page 0 0 128 30 Sat Apr 13 13:35:32 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 7836 7836 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 20 20 1008 4 nfs_page 91080 91080 128 30 Sat Apr 13 13:35:33 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 15920 15920 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 20 20 1008 4 nfs_page 134220 134220 128 30 Sat Apr 13 13:35:34 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 23596 23596 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 18 20 1008 4 nfs_page 167730 167730 128 30 Sat Apr 13 13:35:35 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 30228 30228 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 18 20 1008 4 nfs_page 196650 196650 128 30 Sat Apr 13 13:35:36 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 33124 33124 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 17 20 1008 4 nfs_page 209096 209100 128 30 ... later when I tried umount -f Sat Apr 13 13:38:04 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 33119 33124 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 17 20 1008 4 nfs_page 209019 209100 128 30 Sat Apr 13 13:38:05 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 314 568 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 17 20 1008 4 nfs_page 34959 35610 128 30 Sat Apr 13 13:38:06 CST 2013 nfs_direct_cache 0 0 200 19 nfs_commit_data 0 0 704 11 nfs_write_data 314 568 960 4 nfs_read_data 0 0 896 4 nfs_inode_cache 17 20 1008 4 nfs_page 34959 35610 128 30 And I made a table for the nfs_page item: Seconds nfs_page delta nfs_page delta MB 1 0 N/A N/A 2 91080 91080 355.7812 3 134220 43140 168.5156 4 167730 33510 130.8984 5 196650 28920 112.9688 6 209096 12446 48.61719 n 209019 -77 -0.3008 n+1 34959 -174060 -679.92 n+2 34959 0 0 Output from: vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 906256 11720 110764 0 0 0 4 215 623 0 0 100 0 0 0 0 906504 11720 110764 0 0 0 4 185 513 0 0 100 0 1 0 0 858888 11728 155728 0 0 0 32 239 524 0 1 99 0 2 1 0 480316 10368 509768 0 0 40 4 55192 130021 0 2 76 21 0 2 0 308452 9344 667840 0 0 124 4 36367 59147 0 0 78 21 0 2 0 170548 8860 793952 0 0 32 4 27040 42421 0 1 77 22 0 2 0 52020 8400 903524 0 0 0 4 22961 36318 0 0 77 22 0 0 0 42160 4780 916328 0 0 4 48 7544 12293 0 0 90 9 0 0 0 41796 4780 916160 0 0 0 4 141 398 0 0 100 0 0 0 0 41292 4780 916160 0 0 0 4 132 463 0 0 100 0 0 0 0 41168 4780 916164 0 0 0 4 146 461 0 0 100 0 0 0 0 40796 4780 916156 0 0 0 4 142 447 0 0 100 0 0 0 0 41176 4788 916160 0 0 0 36 170 536 0 0 100 0 later when I tried umount -f 0 1 0 44360 4748 911660 0 0 0 32 205 493 0 0 87 13 0 1 0 44724 4748 911668 0 0 0 4 143 429 0 0 87 13 0 1 0 44616 4748 911672 0 0 0 4 187 515 0 0 87 13 0 0 0 103616 4828 911936 0 0 384 4 2572 41347 0 1 94 4 0 0 0 103772 4828 911936 0 0 0 4 120 321 0 0 100 0 0 0 0 104632 4836 911968 0 0 0 40 155 438 0 0 100 0 Finally, I attach a file that holds the /proc/meminfo at 1 second interval. This is from a different capture though, because I used wrong command at first. Attachment:
mtrack _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |