This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] OOM problems

To: John Weekes <lists.xen@xxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] OOM problems
From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Date: Wed, 17 Nov 2010 20:08:57 -0800
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Wed, 17 Nov 2010 20:09:54 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CE49D98.2030402@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4CDE44E2.2060807@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C25@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CDE4C08.70309@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C2E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE1037402000078000222F0@xxxxxxxxxxxxxxxxxx> <1289814037.21694.22.camel@ramone> <4CE1751F.9020202@xxxxxxxxxxxxxxxxxx> <4CE2E163.2090809@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702E0E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE450E7.9010508@xxxxxxxxxxxxxxxxxx> <1290043433.11102.1742.camel@xxxxxxxxxxxxxxxxxxxxxxx> <4CE49D98.2030402@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2010-11-17 at 22:29 -0500, John Weekes wrote:
> Daniel:
>  > Which branch/revision does latest pvops mean?
> stable-2.6.32, using the latest pull as of today. (I also tried 
> next-2.6.37, but it wouldn't boot for me.)
> > Would you be willing to try and reproduce that again with the XCP blktap
> > (userspace, not kernel) sources? Just to further isolate the problem.
> > Those see a lot of testing. I certainly can't come up with a single fix
> > to the aio layer, in ages. But I'm never sure about other stuff
> > potentially broken in userland.
> I'll have to give it a try. Normal blktap still isn't working with 
> pv_ops, though, so I hope this is a drop-in for blktap2.

I think it should work fine, or wouldn't ask. If not, lemme know.

> In my last bit of troubleshooting, I took O_DIRECT out of the open call 
> in tools/blktap2/drivers/block-aio.c, and preliminary testing indicates 
> that this might have eliminated the problem with corruption. I'm testing 
> further now, but could there be an issue with alignment (since the 
> kernel is apparently very strict about it with direct I/O)? 

Nope. It is, but they're 4k-aligned all over the place. You'd see syslog
yelling quite miserably in cases like that. Keeping an eye on syslog
(the daemon and kern facilites) is a generally good idea btw.

> (Removing 
> this flag also brings back in use of the page cache, of course.)

I/O-wise it's not much different from the file:-path. Meaning it should
have carried you directly back into the Oom realm.

> > If dio is definitely not what you feel you need, let's get back your
> > original OOM problem. Did reducing dom0 vcpus help? 24 of them is quite
> > aggressive, to say the least.
> When I switched to aio, I reduced the vcpus to 2 (I needed to do this 
> with dom0_max_vcpus, rather than through xend-config.sxp -- the latter 
> wouldn't always boot). I haven't separately tried cached I/O with 
> reduced CPUs yet, except in the lab; and unfortunately I still can't get 
> the problem to happen in the lab, no matter what I try.

Just reducing the cpu count alone sounds like sth worth trying even on a
production box, if the current state of things already tends to take the
system down. Also, the dirty_ratio sysctl should be pretty safe to tweak
at runtime.

> > If that alone doesn't help, I'd definitely try and check vm.dirty_ratio.
> > There must be a tradeoff which doesn't imply scribbling the better half
> > of 1.5GB main memory.
> The default for dirty_ratio is 20. I tried halving that to 10, but it 
> didn't help. 

Still too much. That's meant to be %/task. Try 2, with 1.5G that's still
a decent 30M write cache and should block all out of 24 disks after some
700M, worst case. Or so I think...

> I could try lower, but I like the thought of keeping this 
> in user space, if possible, so I've been pursuing the blktap2 path most 
> aggressively.

Okay. I'm sending you a tbz to try.


> Ian:
> >  That's disturbing. It might be worth trying to drop the number of VCPUs in 
> > dom0 to 1 and then try to repro.
> >  BTW: for production use I'd currently be strongly inclined to use the XCP 
> > 2.6.32 kernel.
> Interesting, ok.
> -John

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>