Xen project Mailing List

Re: [Xen-devel] ext3 directory corruption under Xen

To: "Christopher S. Aker" <caker@xxxxxxxxxxxx>

From: Kurt Hackel <kurt.hackel@xxxxxxxxxx>

Date: Mon, 23 Jun 2008 11:16:31 -0700

Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx

Delivery-date: Mon, 23 Jun 2008 11:25:28 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, Is your 32bit pae domU paravirt or hvm? We have seen similar ext3 corruptions on rhel3 and rhel4 32pae hvm guests, one which appeared to be triggered by a shadow optimization for pae. thanks kurt On Mon, Jun 23, 2008 at 12:15:33PM -0400, Christopher S. Aker wrote: > We've been seeing a rash of ext3 directory corruption occurring under Xen. > All but one of the reports have been with filesystems formatted with 1024 > blocksize. We have one report, that's potentialy the same bug, occurring > on a filesystem with 4096 blocksize (either way, it was some type of > corruption in that case). In all cases, the filesystems were mounted with > ext3's default journaling mode. No quotas or anything else other than the > default ext3 mount options. > > It's happened on a number of different hosts, all of the same hardware and > software configuration (Xen 3.2 64bit, 32bit pae dom0, 32bit pae domUs. > LVM backend with 3ware hardware RAID-1). Some of those hosts were > previously running non-virtaulized Linux and UML, using the identical guest > images, and under that configuration never experienced this problem. > > This has occurred under both 2.6.18-xenbits and the more recent pv_ops > based kernels (2.6.24, 2.6.25), which I presume are all using the same > blkfront driver code. > > The common workloads from the reports seems to be active maildirs and > rsync. > > The initial errors reported back are all from fs/ext3/dir.c, in > ext3_check_dir_entry(). Most commonly hit is the "rec_len % 4 != 0" check. > We've seen other checks trigger, but my assumption is that those happen > after more stuff gets whacked out. > > Eventually the fs will go read-only. In extreme cases, the fs is chewed > through enough that data is lost. > > It's tricky to track down the trigger because you can only detect the > corruption after it's happened. Our attempts to reproduce this using > various filesystem thrashing scripts haven't yielded a reliable way to > trigger it, however we have been successful in triggering it twice -- in > two weeks :( . > > My hope is that this triggers an "a-hah" from someone in LKML or Xen land > who has experience with this code, or that this is a known issue and a fix > already lives. > > We're scared. Please help. > > Thanks, > -Chris > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel -- _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.