[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] kernel panics with drbd
On Tue, 2015-08-04 at 14:52 +0100, Matthew Vernon wrote: > Hi, Hello, > I'm getting dom0 kernel panics, relating to moderately heavy use of > drbd. I think this is a Xen bug. It is remarkably similar looking to http://blog.chinewalking.com/drbd-kernel-oops-w-trim/ . Do you have trim? Ian. > > My Xen hosts are Debian jessie amd64 boxes, on slightly elderly Intel > kit. > > Linux ophon 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04 > -24) x86_64 GNU/Linux > Linux opus 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07 > -17) x86_64 GNU/Linux > > Both have the standard jessie versions of Xen - 4.4.1-9+deb8u1 and > xen-tools - 4.5-1 > > I have disable_sendpage enabled for drbd: > root@opus:~# cat /etc/modprobe.d/drbd.conf > options drbd disable_sendpage=1 > root@opus:~# cat /sys/module/drbd/parameters/disable_sendpage > Y > root@ophon:~# cat /etc/modprobe.d/drbd.conf > options drbd disable_sendpage=1 > root@ophon:~# cat /sys/module/drbd/parameters/disable_sendpage > Y > > I have a script running on "ophon" that sets up a drbd device (itself > as primary, "opus" as secondary), makes an LVM pv+vg on top of that > drbd device, and then calls xen-create-image[0]. "opus" typically kernel > panics shortly after xen-create-image starts. > > I attach the relevant bit of kern.log from one such crash to this mail > - you can see the drbd operations happening a second or so before the > crash. I also attach the relevant drbd .res file > > The bug is not 100% repeatable, but still fairly reliable (for obvious > reasons, extensive testing and hard-rebooting my kit is not a very > joyous prospect). I did once achieve a similar result by running > drbd-overview on opus, which said > kernel:[ 1127.630208] BUG: soft lockup - CPU#2 stuck for 23s! > [xenstored:864] > on console and then panicked much as before. > > The "amusing" quirk is that similar code worked a couple of weeks ago > when I last tried it; that code does now also produce kernel panics > AFAICT (with a not-100%-repeatable bug and long reproduction > timescales 'cos of having to power-cycle etc. it's hard to be > completely certain). > > The two hosts are part of a pacemaker cluster, and "opus" is otherwise > able to run guests fine. > > I hope that's sufficient information; I'm happy to supply other config > files etc. if necessary. > > Regards, > > Matthew > > [0] The code in question is in fact a python script; running on ophon, > it does the following (using ssh to run commands on opus): > --both hosts-- > lvcreate -L 20G -nmwsig-mws-02474 guests > drbdadm -- --force create-md > drbdadm up mws-02474 > --ophon only-- > drbdadm wait-connect > drbdadm new-current-uuid --clear-bitmap minor-4 > drbdadm primary mws-02474 > pvcreate /dev/drbd4 > vgcreate mws-02474-vg /dev/drbd4 > xen-create-image ... --lvm mws-02474-vg > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |