[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen unstability on HP Moonshot m400





On Tue, Mar 24, 2015 at 3:00 PM, Mark Salter <msalter@xxxxxxxxxx> wrote:
On Tue, 2015-03-24 at 09:54 -0400, Mark Salter wrote:
> On Mon, 2015-03-23 at 23:58 +0000, Stefano Stabellini wrote:
> > On Mon, 23 Mar 2015, Christoffer Dall wrote:
> > > On Mon, Mar 23, 2015 at 1:36 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
> > >Â Â Â ÂOn Sat, 2015-03-21 at 13:34 +0100, Christoffer Dall wrote:
> > >Â Â Â Â> Hi,
> > >Â Â Â Â>
> > >Â Â Â Â> I have been experiencing a problematic crash running Xen on m400 over
> > >   Â> the last few days. I already spoke to Ian and Stefano about this, but
> > >Â Â Â Â> thought I'd summarize what I've seen so far and loop in a wider
> > >Â Â Â Â> audience.
> > >Â Â Â Â>
> > >Â Â Â Â> The basic setup is this:
> > >Â Â Â Â>Â - Two m400 nodes, one running Linux bare-metal, the other running
> > >Â Â Â Â> Xen.
> > >Â Â Â Â>Â - The Xen node runs Dom0 and 1 DomU
> > >Â Â Â Â>Â - The m400 has a Mellanox Connectx-3 PCIe 10G ethernet card with two
> > >Â Â Â Â> parts on it
> > >Â Â Â Â>Â - Dom0 uses NAT forwarding from Dom0's eth0 (which is connected to
> > >Â Â Â Â> the internet) and regular bridging to eth1 which is connected to a
> > >Â Â Â Â> private VLAN to the bare-metal node
> > >Â Â Â Â>Â - Dom0 and DomU are configured with 14GB of ram, 4 cpus each
> > >Â Â Â Â>Â - DomU runs apache2 serving the GCC manual (see
> > >Â Â Â Â> https://github.com/chazy/kvmperf/blob/master/cmdline_tests/apache_install.sh)
> > >Â Â Â Â>
> > >Â Â Â Â> The bare-metal node runs apache bench, like this: "ab -n 100000 -c 100
> > >Â Â Â Â>http://secure-web.cisco.com/1r5tZ8-7RF8gHRANwFdizEZzgeMsjxVO0yKbYiV4zy7LeiUfYBXMkFq7FGW_SZ1x-VxdzyK-ErDsOUiQ9z2x-N
> > > y7XkL_loHP8ene_BuNFscGyWmQ3r6CtXAYaZCY4xRmmPT1uJOsZDLMu7j-LfCOGmQDSdBwgW7QYukI2bCtTrXM/http%3A%2F%2F10.10.1.120%2F
> > >Â Â Â Âgcc%2Findex.html"
> > >Â Â Â Â>
> > >Â Â Â Â> (10.10.1.120 is the DomU IP address of the bridged interface to eth1)
> > >Â Â Â Â>
> > >   Â> What happens now is that the entire Xen node goes down. I see various
> > >Â Â Â Â> errors in the kernel log, some examples:
> > >Â Â Â Â> http://pastebin.ubuntu.com/10642148/
> > >Â Â Â Â> http://pastebin.ubuntu.com/10642177/
> > >Â Â Â Â> http://pastebin.ubuntu.com/10642181/
> > >Â Â Â Â> http://pastebin.ubuntu.com/10635573/
> > >Â Â Â Â>
> > >Â Â Â Â>
> > >Â Â Â Â> All Linux kernels are 3.18 plus some tweaks for the m400 cartridge:
> > >Â Â Â Â> https://github.com/columbia/linux-kvm-arm/tree/columbia-armvirt-3.18
> > >
> > >Â Â Â ÂIs it worth adding
> > >Â Â Â Âhttps://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?id=285994a62c80f1d72c6924282bcb59608098d5ec
> > >Â Â Â Âto your kernel? It isn't Xen specific but it's perhaps possible that Xen opens the window wider.
>
> You definitely want that one. Without it, the page table walker could
> end up using a stale pointer to a page being used for something other
> than page tables.
>
> > >
> > >Â Â Â ÂHow confident are you in
> > >Â Â Â Âhttps://github.com/columbia/linux-kvm-arm/commit/5e29cb0478f3d90e4f568d6bea6840960331bcbb ?
> > >Â Â Â Â(although I suppose you aren't running in ACPI mode if you are running
> > >Â Â Â ÂXen?)
> > >
> > >
> > > I'm not confident at all, but Linux (last I checked was v3.19) doesn't boot without it, so not sure if there's an
> > > alternative? Mark?
> >
> > This patch is key: it doesn't look like it is setting
> > dev->archdata.dma_coherent appropriately, see the implementation of
> > set_arch_dma_coherent_ops.
>
> You'd want this if booting with ACPI. You might also need it for
> enumerated PCI devices even if booting with devicetree.

There's an updated version of this patch for newer kernels in the
devel branch of git.fedorahosted.org/git/kernel-arm64.git

There is also this one in Linus' tree which may be of interest to you:

commit 7132813c384515c9dede1ae20e56f3895feb7f1e
Author: Suzuki K. Poulose <suzuki.poulose@xxxxxxx>
Date:Â ÂThu Mar 19 18:17:09 2015 +0000

  arm64: Honor __GFP_ZERO in dma allocations

Thanks Mark!

I'll give both a try!

-ChristofferÂ
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.