[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-unstable test] 21375: regressions - FAIL



On Wed, Nov 06, 2013 at 05:22:29PM +0000, Anthony PERARD wrote:
> On Fri, Nov 01, 2013 at 03:46:36PM +0000, Anthony PERARD wrote:
> > On Fri, Nov 01, 2013 at 12:06:51PM +0000, Ian Campbell wrote:
> > > On Fri, 2013-11-01 at 11:58 +0000, Anthony PERARD wrote:
> > > > On Fri, Nov 01, 2013 at 10:43:16AM +0000, Ian Campbell wrote:
> > > > > On Fri, 2013-11-01 at 10:38 +0000, xen.org wrote:
> > > > > > flight 21375 qemu-upstream-unstable real [real]
> > > > > > http://www.chiark.greenend.org.uk/~xensrcts/logs/21375/
> > > > > > 
> > > > > > Regressions :-(
> > > > > > 
> > > > > > Tests which did not succeed and are blocking,
> > > > > > including tests which could not be run:
> > > > > >  test-amd64-i386-qemuu-rhel6hvm-intel  7 redhat-install    fail 
> > > > > > REGR. vs. 20054
> > > > > 
> > > > > Anythony, have you made any progress on this? It's been failing for 
> > > > > ages
> > > > > now...
> > > > 
> > > > Yes, looks like the bug it trigger during a vesa resolution change. I
> > > > have try to use the vgabios blob that we use for qemu-traditionnal and
> > > > it works fine. But with the vgabios blob provided by qemu, it does not
> > > > work... I'm still not sure of what the bug is, but I'm getting closer to
> > > > it.
> > > 
> > > Yay!
> > > 
> > > > Also, this happen only on an Intel machine, on an AMD machine,
> > > > everything works like a charm.
> > > > 
> > > > More detail, if anyone want to know:
> > > > It's look like syslinux is doing a int 10h call that never return to set
> > > > video mode:
> > > > Int 0x10, with AX=0x4F02
> > > 
> > > This looks like it might be handled by SeaBIOS vgasrc/vbe.c:vbe_104f00 ?
> > > There seem to be a few changes in upstream seabios since the version
> > > referenced in xen.git:Config.mk. Many of them are cleanups/code motion
> > > but a few look worth investigating. 
> > 
> > I've been able to get the things working by applying a patch to vgabios
> > that is in xen tree: a0e7ccf6864c196906d58b54cd0996b4dbc1b022
> > This patch allow to clear the framebuffer much faster.
> > 
> > But it those not really help be to understand why the guest freeze. A
> > couple more printf might.
> 
> I finally managed to have a better understanding of the issue.
> 
> So, the vgabios blob provided by QEMU have a routine to clear the video
> ram that take few seconds to run. That give enough time to QEMU to try
> to refresh is display, and this mean they will be a call to
> xc_hvm_track_dirty_vram(). If the function is called while the vgabios
> routine is running, then the guest is lost.
> 
> The issue appear only with an Intel machine on an HVM guest using EPT.
> Having the guest using shadow works fine. So I'm going to investigate
> the track_dirty code in Xen.
> 
> The vgabios routine is called by syslinux with an Int 0x10, I tryied to
> get some debug print after the call, either from the guest serial or
> by using the Xen debug ioport, nothing ever appear, and gdbsx only gave
> me some weird IP which does not appear to point to any usefull code
> (it's all zeros).

An other update,

we had the idee of trying this on earlier versin of Xen, and it turns
out that Xen 4.3 works fine. One bisect later, and a commit turns out.

commit 86781624f8df1d50eb4185cfc2ddce926798f7aa
x86_emulate: PUSH <mem> must read source operand just once
... for the case of accessing MMIO.

So after this commit, syslinux stop working correctly with the last
version of QEMU. This happen if QEMU is calling track_dirty_vram.

I also have use xentrace/xenalyze to try to grab more information about
the issue, it did not really help, but it's tell me that the guest is
stock on a specific instruction (it result in vmexit EPT_VIOLATION over
and over on xentrace). And that were the guest is stock:

   0xa126:  mov    %eax,%cr0
   0xa129:  ljmp   $0xf2e,$0xa12e
   0xa130:  mov    $0x26,%dl
   0xa132:  or     %bh,(%eax)
   0xa134:  movzww %sp,%sp
   0xa138:  mov    %edx,%ds
   0xa13a:  mov    %edx,%es
   0xa13c:  mov    %edx,%fs
   0xa13e:  mov    %edx,%gs
   0xa140:  jmp    *%ebx
   0xa142:  pushf  
=> 0xa143:  lcall  *%cs:(%si)
   0xa147:  mov    $0x0,%ch

Before trying on earlier version of Xen, I try to understand what when
wrong on the Xen side, it turn out that, in the track_dirty_vram
hypercall, a call to hap_enable_log_dirty() is all that needed to break
the guest.

Jan, any idee of what the issue is?

Regards,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.