WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] [Iscsitarget-devel] tracking down cause of filesystem corrup

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] [Iscsitarget-devel] tracking down cause of filesystem corruption
From: Steve Wray <steve.wray@xxxxxxxxx>
Date: Wed, 31 Jan 2007 12:58:07 +1300
Delivery-date: Tue, 30 Jan 2007 15:58:20 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.5 (X11/20060813)
Hi there,

This has just been posted to the following mailing lists:
open-iscsi@xxxxxxxxxxxxxxxx
iscsitarget-devel@xxxxxxxxxxxxxxxxxxxxx

I have been advised to send it to the xen mailing list as well, so here
we are!
:)


I've been testing iscsi for use in a XEN virtualisation environment and
have been getting pretty bad filesystem corruption after only about 30
minutes of use.

I don't have the time or resources to track down exactly what is causing
this without some help and guidance; for example its a mystery to me as
to whether its the initiator end or the target end or something else
again which is causing the problems.


1. The initiator is on a xen3 dom0 host running Debian Etch.
2. The target is provided by another xen3 dom0 host running Debian Etch.
3. The domU (virtual machine) is running Debian Etch and sees the iscsi
target as a block device given to it by XEN; the domU knows nothing
about iscsi.
4. The filesystem being presented via iscsi is ext3 with default mount
options.
5. uname -a on all machines reads pretty much the same:
Linux fileserver 2.6.18-3-xen-686 #1 SMP Mon Dec 4 20:48:20 UTC 2006
i686 GNU/Linux


I get errors like this on the initiator:
<errors>
Jan 31 09:13:15 xen5 kernel: sd 3:0:0:0: SCSI error: return code =
0x00010000
Jan 31 09:13:15 xen5 kernel: end_request: I/O error, dev sde, sector 6996544
</errors>

and the domU sees its root filesystem disappear and hangs.

I see no errors in logs on the machine providing the target.

The target daemon was not restarted nor HUPPED during these tests
neither was the initiator daemon.

The network interfaces on the initiator and target show no errors, no
dropped and no overruns (according to ifconfig).


My ietd.conf for the iscsi target used by the domU is attached
(passwords changed) as is the iscsid node config (again, passwords changed).


You will notice that this domU has been running spamassassin; I figured
that would give the iscsi layer a good workout since it was running
spamassassin for 3 seperate mail servers each with fairly high throughput.


I am hoping that there is some config parameter which I have set
inappropriately and that there is an easy fix as iscsi + xen would be
very useful!


If there is any further info that anyone needs please ask.

If there are any commandlines I can run to extract info or perform some
diagnostics to see where the problems are occuring, please send them to
me and I'll send back the results.



node.name = iqn.2006-12.fileserver:spamassassin
node.transport_name = tcp
node.tpgt = 1
node.active_conn = 1
node.startup = automatic
node.session.initial_cmdsn = 0
node.session.auth.authmethod = None
node.session.auth.username = spamassassin
node.session.auth.password = password
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 10
node.session.err_timeo.reset_timeout = 30
node.session.iscsi.InitialR2T = Yes
node.session.iscsi.ImmediateData = No
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 0
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 10.10.10.129
node.conn[0].port = 3260
node.conn[0].startup = automatic
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.active_timeout = 5
node.conn[0].timeo.idle_timeout = 60
node.conn[0].timeo.ping_timeout = 5
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536
node.conn[0].iscsi.HeaderDigest = CRC32C
node.conn[0].iscsi.DataDigest = CRC32C
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

Target iqn.2006-12.fileserver:spamassassin
        Lun 0 Path=/dev/volumes/spamassassin,Type=fileio
        Lun 2 Path=/dev/volumes/spamassassin-swap,Type=fileio

        Alias spamassassin

        IncomingUser            spamassassin password

        InitialR2T              Yes
        ImmediateData           No
        MaxRecvDataSegmentLength 8192
        MaxXmitDataSegmentLength 8192
        MaxBurstLength          262144
        FirstBurstLength        65536
        DefaultTime2Wait        2
        DefaultTime2Retain      20
        MaxOutstandingR2T       8
        DataPDUInOrder          Yes
        DataSequenceInOrder     Yes
        ErrorRecoveryLevel      0
        HeaderDigest            CRC32C
        DataDigest              CRC32C


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users