This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: VM hung after running sometime

The interrputs file is attached. The server has 24 HVM domains runnning about  40 hours.
Well, we may upgrade to the new kernel in the further, but currently we prefer the fix has least impact on our present server.
So it is really nice of you if you could offer the sets of patches, also, it would be our fisrt choice.
Later I will kick off the irqbalance disabled test in different servers, will keep you noticed.
Thanks for your kindly assitance.
> Date: Wed, 22 Sep 2010 11:31:22 -0700
> From: jeremy@xxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: keir.fraser@xxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Re: VM hung after running sometime
> On 09/21/2010 06:19 PM, MaoXiaoyun wrote:
> > Thanks for the details.
> >
> > Currently guest VM hang in our heavy IO stress test, (In detail, we
> > have created more than 12 HVMS on our 16cores physical server,
> > and each of HVM inside, iometer and ab regard as heavy IO periodically
> > run). Guest hang shows up in 1 or 2 days. So the IO is very
> > heavy, so as the interrupts, I think.
> What does /proc/interrupts look like?
> >
> > According to the hang log, the domain blocked in _VPF_blocked_in_xen,
> > indicates "x=1" in log file below, and that is port 1, 2. And
> > all our HVM a re have PVdriver installed, one thing I am not clear
> > right now is the IO event in these two ports. Does it only include
> > "mouse, vga"event, or it also includes hard disk events? (If it has
> > hard disk events included, the interrupt would be very heavy, right?
> > and right now we have 4 physical CPU allocated to domain 0, is it
> > appropriate ? )
> I'm not sure of the details of how qemu<->hvm interaction works, but it
> was hangs in blkfront in PV domains which brought the lost event problem
> to light. At the basic event channel level, they will both look the
> same, and suffer from the same problems.
> >
> > Anyway, I think I can have irqbalance disabled for a quick test.
> Thanks; that should confirm the diagnosis.
> > Meanwhile, I will spent some time on the patch merge.
> If you're not willing to go to t he current kernel, I can help you with
> the minimal set of patches to backport.
> J

Attachment: interrupts.txt
Description: Text document

Xen-devel mailing list