xen-devel

[Top] [All Lists]

Re: [Xen-devel] OOM problems

from [John Weekes]

[Permanent Link][Original]

To:	Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject:	Re: [Xen-devel] OOM problems
From:	John Weekes <lists.xen@xxxxxxxxxxxxxxxxxx>
Date:	Wed, 17 Nov 2010 23:15:17 -0800
Cc:	Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date:	Wed, 17 Nov 2010 23:16:20 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1290053337.18200.28.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4CDE44E2.2060807@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C25@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CDE4C08.70309@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C2E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE1037402000078000222F0@xxxxxxxxxxxxxxxxxx> <1289814037.21694.22.camel@ramone> <4CE1751F.9020202@xxxxxxxxxxxxxxxxxx> <4CE2E163.2090809@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702E0E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE450E7.9010508@xxxxxxxxxxxxxxxxxx> <1290043433.11102.1742.camel@xxxxxxxxxxxxxxxxxxxxxxx> <4CE49D98.2030402@xxxxxxxxxxxxxxxxxx> <1290053337.18200.28.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6

I think [XCP blktap] should work fine, or wouldn't ask. If not, lemme know.

k.

In my last bit of troubleshooting, I took O_DIRECT out of the open call
in tools/blktap2/drivers/block-aio.c, and preliminary testing indicates
that this might have eliminated the problem with corruption. I'm testing
further now, but could there be an issue with alignment (since the
kernel is apparently very strict about it with direct I/O)?

Nope. It is, but they're 4k-aligned all over the place. You'd see syslog
yelling quite miserably in cases like that. Keeping an eye on syslog
(the daemon and kern facilites) is a generally good idea btw.

I've been doing that and haven't seen any unusual output so far, which Iguess is good.

(Removing
this flag also brings back in use of the page cache, of course.)

I/O-wise it's not much different from the file:-path. Meaning it should
have carried you directly back into the Oom realm.

Does it make a difference that it's not using "loop" and instead the CPUusage (and presumably some blocking) occurs in user-space? There's nottoo much information on this out there, but it seems at though the OOMissue might be at least somewhat loop device-specific. One document thatreferences loop OOM problems that I found is this one:http://sources.redhat.com/lvm2/wiki/DMLoop. My initial take on it wasthat it might be saying that it mattered when these things were beingdone in the kernel, but now I'm not so certain --

".. [their method and loop] submit[s] [I/O requests] via a kernel threadto the VFS layer using traditional I/O calls (read, write etc.). Thishas the advantage that it should work with any file system typesupported by the Linux VFS (including networked file systems), but hassome drawbacks that may affect performance and scalability. This isbecause it is hard to predict what a file system may attempt to do whenan I/O request is submitted; for example, it may need to allocate memoryto handle the request and the loopback driver has no control over this.Particularly under low-memory or intensive I/O scenarios this can leadto out of memory (OOM) problems or deadlocks as the kernel tries to makememory available to the VFS layer while satisfying a request from theblock layer. "

Would there be an advantage to using blktap/blktap2 over loop, if Ileave off O_DIRECT? Would it be faster, or anything like that?

Just reducing the cpu count alone sounds like sth worth trying even on a
production box, if the current state of things already tends to take the
system down. Also, the dirty_ratio sysctl should be pretty safe to tweak
at runtime.


That's good to hear.

The default for dirty_ratio is 20. I tried halving that to 10, but it
didn't help.

Still too much. That's meant to be %/task. Try 2, with 1.5G that's still
a decent 30M write cache and should block all out of 24 disks after some
700M, worst case. Or so I think...

Ah, ok. I was thinking that it was global. With a small per-processcache like that, it becomes much closer to AIO for writes, but at leastthe leftover memory could still be used for the read cache.


-John

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
RE: [Xen-devel] OOM problems, (continued) RE: [Xen-devel] OOM problems, Daniel Stodden RE: [Xen-devel] OOM problems, Jan Beulich Re: [Xen-devel] OOM problems, John Weekes Re: [Xen-devel] OOM problems, John Weekes RE: [Xen-devel] OOM problems, Ian Pratt Re: [Xen-devel] OOM problems, John Weekes RE: [Xen-devel] OOM problems, Ian Pratt Re: [Xen-devel] OOM problems, Daniel Stodden Re: [Xen-devel] OOM problems, John Weekes Re: [Xen-devel] OOM problems, Daniel Stodden Re: [Xen-devel] OOM problems, John Weekes <= Re: [Xen-devel] OOM problems, Daniel Stodden Re: [Xen-devel] OOM problems, John Weekes RE: [Xen-devel] OOM problems, Stefano Stabellini Re: [Xen-devel] OOM problems, George Shuklin

Previous by Date:	[Xen-devel] [xen-unstable test] 2748: trouble: blocked/broken/pass, xen . org
Next by Date:	[Xen-devel] Biweekly VMX status report. Xen: 22386 & Dom0:6c72eadd15, Zhang, Yang Z
Previous by Thread:	Re: [Xen-devel] OOM problems, Daniel Stodden
Next by Thread:	Re: [Xen-devel] OOM problems, Daniel Stodden
Indexes:	[Date] [Thread] [Top] [All Lists]