This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] OOM problems

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject: Re: [Xen-devel] OOM problems
From: John Weekes <lists.xen@xxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Nov 2010 19:29:28 -0800
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date: Wed, 17 Nov 2010 19:30:20 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1290043433.11102.1742.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4CDE44E2.2060807@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C25@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CDE4C08.70309@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702C2E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE1037402000078000222F0@xxxxxxxxxxxxxxxxxx> <1289814037.21694.22.camel@ramone> <4CE1751F.9020202@xxxxxxxxxxxxxxxxxx> <4CE2E163.2090809@xxxxxxxxxxxxxxxxxx> <4FA716B1526C7C4DB0375C6DADBC4EA38D80702E0E@xxxxxxxxxxxxxxxxxxxxxxxxx> <4CE450E7.9010508@xxxxxxxxxxxxxxxxxx> <1290043433.11102.1742.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20101027 Thunderbird/3.1.6

> Which branch/revision does latest pvops mean?

stable-2.6.32, using the latest pull as of today. (I also tried next-2.6.37, but it wouldn't boot for me.)
Would you be willing to try and reproduce that again with the XCP blktap
(userspace, not kernel) sources? Just to further isolate the problem.
Those see a lot of testing. I certainly can't come up with a single fix
to the aio layer, in ages. But I'm never sure about other stuff
potentially broken in userland.

I'll have to give it a try. Normal blktap still isn't working with pv_ops, though, so I hope this is a drop-in for blktap2.

In my last bit of troubleshooting, I took O_DIRECT out of the open call in tools/blktap2/drivers/block-aio.c, and preliminary testing indicates that this might have eliminated the problem with corruption. I'm testing further now, but could there be an issue with alignment (since the kernel is apparently very strict about it with direct I/O)? (Removing this flag also brings back in use of the page cache, of course.)

If dio is definitely not what you feel you need, let's get back your
original OOM problem. Did reducing dom0 vcpus help? 24 of them is quite
aggressive, to say the least.

When I switched to aio, I reduced the vcpus to 2 (I needed to do this with dom0_max_vcpus, rather than through xend-config.sxp -- the latter wouldn't always boot). I haven't separately tried cached I/O with reduced CPUs yet, except in the lab; and unfortunately I still can't get the problem to happen in the lab, no matter what I try.

If that alone doesn't help, I'd definitely try and check vm.dirty_ratio.
There must be a tradeoff which doesn't imply scribbling the better half
of 1.5GB main memory.

The default for dirty_ratio is 20. I tried halving that to 10, but it didn't help. I could try lower, but I like the thought of keeping this in user space, if possible, so I've been pursuing the blktap2 path most aggressively.


 That's disturbing. It might be worth trying to drop the number of VCPUs in 
dom0 to 1 and then try to repro.
 BTW: for production use I'd currently be strongly inclined to use the XCP 
2.6.32 kernel.

Interesting, ok.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>