WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

RE: [Xen-users] GPLPV (9.11pre20) in Win2003 x64 onXenServerEnterprise 5

To: "Roel Broersma" <roel@xxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-users] GPLPV (9.11pre20) in Win2003 x64 onXenServerEnterprise 5.0 (CD drive missing)
From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
Date: Sun, 16 Nov 2008 00:11:15 +1100
Cc:
Delivery-date: Sat, 15 Nov 2008 05:12:03 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20515016.post@xxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <20499705.post@xxxxxxxxxxxxxxx><AEC6C66638C05B468B556EA548C1A77D0154FBBA@trantor> <20515016.post@xxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AclHHiAoaBWIFYzyRnyZXiZ8K/kOwgAAbyXQ
Thread-topic: [Xen-users] GPLPV (9.11pre20) in Win2003 x64 onXenServerEnterprise 5.0 (CD drive missing)
> James Harper wrote:
> >
> > . send me the output of 'xenstore-ls /local/domain/<id>/device'
> > (substitute <id> for the domain id of the domain in question)
> > . In device manager, you should see one 'Xen Block Device Driver'
> > adapter per device (disk or cdrom). For each one, can you tell me
the
> > value of 'Device Instance Id' in the Properties -> Details tab?
> > . send me a copy of your DomU config
> > . if you know how to use the windows debugger, connect that to the
DomU
> > and send me the output. If you don't know, then just the above stuff
> > might be sufficient to get started - it may be that the XenSource
> > version does things a little differently for CDROM's or something
which
> > I might be able to tell immediately.
> >
> 
> I did a "xe vm-list params=dom-id,name-label"  to see a list of VM's
and
> there IDs.
> Then i did "xenstore-ls /local/domain/28/device"  which have me:
> 
> [root@xensvr2 ~]# xenstore-ls /local/domain/28/device
> vbd = ""
>  832 = ""
>   backend = "/local/domain/0/backend/vbd/28/832"
>   state = "4"
>   backend-id = "0"
>   device-type = "disk"
>   virtual-device = "832"
>   event-channel = "6"
>   ring-ref = "16383"
>  768 = ""
>   backend = "/local/domain/0/backend/vbd/28/768"
>   state = "4"
>   backend-id = "0"
>   device-type = "disk"
>   virtual-device = "768"
>   event-channel = "7"
>   ring-ref = "16238"
>  5632 = ""
>   backend = "/local/domain/0/backend/vbd/28/5632"
>   state = "4"
>   backend-id = "0"
>   device-type = "disk"
>   virtual-device = "5632"
>   event-channel = "8"
>   ring-ref = "16093"
>  5696 = ""
>   backend = "/local/domain/0/backend/vbd/28/5696"
>   state = "4"
>   backend-id = "0"
>   device-type = "cdrom"
>   virtual-device = "5696"
>   event-channel = "9"
>   ring-ref = "15948"
> vif = ""
>  0 = ""
>   backend = "/local/domain/0/backend/vif/28/0"
>   backend-id = "0"
>   state = "4"
>   handle = "0"
>   mac = "1a:87:80:a6:b9:a2"
>   tx-ring-ref = "15947"
>   rx-ring-ref = "15946"
>   event-channel = "10"
>   feature-no-csum-offload = "0"
>   feature-sg = "1"
>   feature-gso-tcpv4 = "1"
>   request-rx-copy = "1"
>   feature-rx-notify = "1"
> [root@xensvr2 ~]#
> 
> I think that is the Xen equivalent of XenServer-api: "xe vbd-list
> params=all
> vm-name-label=mailsvr1" which gives me this:  (see attached file
> file1.txt)
> http://www.nabble.com/file/p20515016/file1.txt file1.txt
> 
> Driver instance IDs:
> - XEN\VBD\4&32FE5319&1&5632
> - XEN\VBD\4&32FE5319&1&5696
> - XEN\VBD\4&32FE5319&1&768
> - XEN\VBD\4&32FE5319&1&832

Well there are 4 devices that the gplpv frontend is seeing, but
obviously something is going wrong and the cdrom devices are never being
reported to windows properly.

See the 4 'backend="/local/domain/0/backend/vbd/<id>/<dev>"' lines
above? Can you do a xenstore-ls against each of those too. The frontend
xenstore stuff looks okay, including 'state=4' which means that the
frontend and backends are connected, but maybe the backend is giving
some wrong information or something.

> (btw: i have now 3 drives connected and should have 1 cd-drive
> connected,..
> which i couldn't see)
> 
> Behavior
> --------
> When i hot-plug a device from the Xenserver, i can not see it in the
> Windows
> 2003 VM.  (even not after a rescan disk  or  hardware detect)   When i
> reboot the VM, it will detect a new device when starting Windows. I
> click..
> next..next.. and it adds another "Xen Block Device Driver".

When I hot-add a network adapter it appears to work, but then all the
network adapters go into 'acquiring dhcp address', but after that is
done it all works again. Hot-removing a network adapter appears to work
too, although after I do it from Xen, I have to 'safely remove' the
device before it disappears from windows. Not sure exactly why that
would be the case but I suppose it can be fixed.

Block devices though aren't going to work... I deliberately fail any
attempt by Windows to recognise block devices added after system boot,
just in case one of them is the same as the qemu devices (eg because
you've just installed the drivers), with all the problems that that
entails. I may be able to fix that too, but I'll have to be careful.

> Other BAD behavior
> -------------------
> Most of our VM's are on the SAN and connected with iSCSI to the
Xenserver.
> When the shit-hits-the-fan and the SAN is going down (broken switch..
> cable
> broken.. or just something else)  all our Windows VM's give BSOD's.
Which
> is a quite normal behavior.  99% of the time we can reboot thse VM's
later
> without any problems,.. very sometimes we need to run a chkdsk.
(luckily
> NTFS is a journalling filesystem).
> BUT:  With the GPLPV drivers.. we do NOT get a BSOD's,  i've waited 10
> minutes for it.  First i see some popups: "Can't write to <filename>
or
> <disk>"  and it will raise many..many popups.  Finally i did a
> force-shutdown from the Xenserver.  Then when rebooting this VM, the
> Master
> Filesystem Table (MFT) was corrupt and couldn't be repaired with
chkdsk.
> The
> were lots of errors on the drive and i had to recover some files with
> "GetDataBack for NTFS".  ... a long night...   :(
> I never had this with the XenServer PV-tools.   I think the GPLPV
drivers
> have a too large disk-cache (write cache?) or something ?  The best is
> too:
> freeze the OS (i've seen that on Linux) or to give a BSOD within a
short
> time  (Windows)... otherwise you're really screwing up things...
> Just test it:  Put 5 VM's on.  4 with the Xenserver PV-tools and 1
with
> the
> GPLPV drivers,  then pull-off the storage.  The 4 VM's are the first
give
> a
> BSOD.. and the GPLPV is probably... never.. ?
> (the thing i don't understand is that when storage is completely
broken,
> it
> wouldn't matter if the VM is on for 10secs. or 10mins..  it can't
write
> through the storage so it can't corrupt things...    This thought let
me
> think about a too-big storage buffer maybe?  So a too-big piece is
> missing... or journalling is not in sync.. ?)

Now that is interesting... yes, you are right in saying that once the
'plug' is pulled to the storage it doesn't really matter (from a data
integrity point of view)  what happens thereafter... a BSoD may be the
correct thing to do. I wonder what the backend will tell me... will it
report a fail on the block request, or will it in turn wait for ages
relying on me to fail the request instead? I'm also not sure if my
drivers should be invoking the BSoD directly... I suspect that they
should fail the request in such a way that Windows knows that all hope
is lost and so Windows should instigate the BSoD.

Either way, it does sound like I'm doing something a bit strange that is
causing problems. This may happen with requests that aren't aligned to a
512 byte boundary - requests larger than 4096 bytes may be written out
of order (wrt other write requests), but those are seldom (never?) seen
during normal use, just at boot time and during a few infrequent
operations like formats.

I have definitely seen filesystem corruption after a crash (hanging the
windows domu 'hard' should have the same effect as you were seeing -
data not getting committed to the disk - that I didn't expect. I put it
down to the circumstances of the crash but maybe there is more to it.

I am definitely not doing any write caching though - I don't tell
Windows that the write is completed until the backend has finished with
the write. The backend may, in turn, be doing write caching, but that
should be the same as with the xensource drivers too.

Can you tell me, during the time the DomU is 'hung' because the SAN has
disconnected, does the SAN come back online before the reboot? If I'm
not managing read or write failures correctly, and suddenly the SAN
comes back online again, then that could be causing problems. If you
think that's the case I can look at the failure paths a bit closer.

> 
> I'm still using 9.11pre20.  I will try to find ou the Windows debugger
> stuf..

Just give me the xenstore-ls of the backend for now. That may be enough
to figure out what is going on.

James

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users