[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] RE: blue screen in windows balloon driver



Currently I test GPLPV only.
I agree on the problem is caused by the heavy IO.
Since machine 212, 23 works fine, never hit the crash, and less reset event.
 
The frequency is not high, attched is a VM running 4hours, but *not* crashed,
on see 9 XenVbd <-- XenVbd_HwScsiResetBus.
 
It looks like our test is too stressed to machine 25, which cause the reset event
and thus produce more reset event, and make the VM crashed.
 
Well, is it difficult for  XenVbd_HwScsiResetBus to handle this properly?
 
many thanks.
 
> Subject: RE: [Xen-devel] RE: blue screen in windows balloon driver
> Date: Wed, 2 Mar 2011 17:07:03 +1100
> From: james.harper@xxxxxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx
>
> That assertion is a bit misleading as it occurs during dump mode when
> the crash has actually already occurred. It still shouldn't occur but
> it's not the problem we are looking for.
>
> Does this problem occur when not using GPLPV?
>
> When you are running GPLPV, can you do a tail -f on the logfile and see
> how quickly the log messages are coming out? If they are printing out
> slowly then I think your physical machine is just overloaded with IO.
>
> James
>
>
>
> > -----Original Message-----
> > From: MaoXiaoyun [mailto:tinnycloud@xxxxxxxxxxx]
> > Sent: Wednesday, 2 March 2011 14:02
> > To: James Harper
> > Cc: xen devel
> > Subject: RE: [Xen-devel] RE: blue screen in windows balloon driver
> >
> >
> > Attached is the three logs for crash.
> > cp17 & 21 crash on
> > Assertion failed: srb != NULL
> >
> > thanks.
> >
> > > Subject: RE: [Xen-devel] RE: blue screen in windows balloon driver
> > > Date: Tue, 1 Mar 2011 23:48:04 +1100
> > > From: james.harper@xxxxxxxxxxxxxxxx
> > > To: tinnycloud@xxxxxxxxxxx
> > > CC: xen-devel@xxxxxxxxxxxxxxxxxxx
> > >
> > > I've pushed a possible fix for the reset code for Windows 2000, XP
> and
> > > 2003. I haven't fixed the Vista/2008/7/2008R2 storport driver yet.
> > >
> > > I'll see what I can do tomorrow to actually test a scsi reset but I
> > > can't reproduce the problem you are seeing on my system. You'l l
> still
> > > see the reset messages in the logs which I think simply indicates
> that
> > > your system is too loaded to complete the requests in time and
> Windows
> > > thinks the scsi bus is hung, but this way it might pick itself up
> again
> > > afterwards. On the other hand it may be that too many timeouts and
> > > resets will cause windows to throw its hands in the air and give up
> and
> > > declare the scsi device offline, in which case there might not be
> much
> > > we can do.
> > >
> > > James
> > >
> > > > -----Original Message-----
> > > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
> > > > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of James Harper
> > > > Sent: Tuesday, 1 March 2011 23:36
> > > > To: MaoXiaoyun
& gt; > > > Cc: xen devel
> > > > Subject: [Xen-devel] RE: blue screen in windows balloon driver
> > > >
> > > > Hold off on testing. I'm fixing up the reset code so that it does
> what
> > > > Windows wants. I'll post something soon if it doesn't take too
> long.
> > > >
> > > > James
> > > >
> > > > > -----Original Message-----
> > > > > From: MaoXiaoyun [mailto:tinnycloud@xxxxxxxxxxx]
> > > > > Sent: Tuesday, 1 March 2011 23:34
> > > > > To: James Harper
> > > > > Cc: xen devel
> > > > > Subject: RE: blue screen in windows balloon driver
> > > > >
> > > > > I will have new driver tested.
> > > > > Attached is the xentop snapshot.
> > > > >
> > > > > thanks.
> > > > >
> > > > > > Subject: RE: blue screen in windows balloon driver
> > > > > > Date: Tue, 1 Mar 2011 23:11:14 +1100
> > > > > > From: james.harper@xxxxxxxxxxxxxxxx
> > > > > > To: tinnycloud@xxxxxxxxxxx
> > > > > >
> > > > > > >
> > > > > > > exe attached, thanks.
> > > > > > >
> > > > > > > I have three machines, on each sum the
> *XenVbd_HwScsiResetBus*
> > > > event.
> > > > > > > 24 VMS, so
> > > > > > > grep XenVbd_HwScsiResetBus qemu-dm-w3.MR_cp* | wc -l
> > > > > > >
> > > > > > > machine 25: VM easily got crash, the sum is 200
> > > > > > > machine 23: VM never got crash, the sum is 10
> > > > > > > machine 212: VM never got crash, the sum is 16
> > > > > > >
> > > > > > > it seems that machine 25 has much more XenVbd_HwScsiResetBus
> > > event
> > > > > > > than other two machines.
> > > > > > >
> > > > > > > BTW, when start 24VM concurrently, the starting process is
> quite
> > > > slow,
> > > > > > takes
> > > > > > > about 20 minutes more to whole started.
> > > > > > >
> > > > > > > I commented line 505 in xenpci_pdo.c to avoid timed out.
> > > > > > >
> > > > > > > 505 //remaining -= thiswait;
> > > > > > >
> > > > > >
> > > > > > It sounds like you are overloading your disk IO bandwidth.
> With
> > > many
> > > > > > DomU's swapping heavily, Dom0 may simply not be able to keep
> up
> > > with
> > > > the
> > > > > > IO throughput required resulting in windows thinking that the
> scsi
> > > > > > device isn't responding. Can you check xentop and see what
> sort of
> > > > IO
> > > > > > operations per second you are getting?
> > > > > >
> > > > > > I have just pushed a change to dump out the in-flight scsi
> > > requests
> > > > > > (srb) when HwScsiResetBus is called. Please apply the patch
> and
> > > send
> > > > me
> > > > > > the next crash.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > ; James
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-devel
>

Attachment: qemu-dm-w3.MR_cp0.vhd.log
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.