[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression between Xen 4.6.0 and 4.7.0, Direct kernel boot on a qemu-xen and seabios HVM guest doesn't work anymore.



On Tue, Oct 25, 2016 at 07:25:06PM +0200, Sander Eikelenboom wrote:
> On 2016-10-25 16:49, Wei Liu wrote:
> >On Tue, Oct 25, 2016 at 01:37:45PM +0200, Sander Eikelenboom wrote:
> >>
> >>Tuesday, October 25, 2016, 1:24:12 PM, you wrote:
> >>
> >>> On Tue, Oct 18, 2016 at 01:48:23PM +0100, Wei Liu wrote:
> >>>> On Mon, Oct 17, 2016 at 05:28:17PM +0200, Sander Eikelenboom wrote:
> >>>> > Thursday, October 13, 2016, 4:43:31 PM, you wrote:
> >>>> >
> >>>> > > Hi Jan / Wei,
> >>>> >
> >>>> > > Took a while before i had the chance to fiddle some more to find the 
> >>>> > > actual culprit.
> >>>> > > After analyzing the output of xl -vvvvv create somewhat more i came 
> >>>> > > to the
> >>>> > > insight it was probably Qemu and not Xen causing the fault.
> >>>> >
> >>>> > > As a test I just used a qemu-xen binary build with xen-4.6.0 booting 
> >>>> > > up a guest with
> >>>> > > direct kernel boot mode on xen-unstable. And that old qemu binary 
> >>>> > > works fine.
> >>>> >
> >>>> > > After testing i can conclude, Jan was right, the bisection was a red 
> >>>> > > herring,
> >>>> > > the problem is caused by some change in Qemu and not by something in 
> >>>> > > the Xen tree.
> >>>> > > (strange thing is that for as far as i know i did a "make distclean" 
> >>>> > > between
> >>>> > > every build (taking a lot of time), which should have pulled a fresh 
> >>>> > > qemu-xen
> >>>> > > tree and therefor the bisection should have lead to a commit with a 
> >>>> > > Config.mk
> >>>> > > hash change for qemu-xen version.)
> >>>> >
> >>>> > > Will see if i can find some more time and bisect qemu and find the 
> >>>> > > culprit.
> >>>> >
> >>>> > > --
> >>>> > > Sander
> >>>> >
> >>>> >
> >>>> > Unfortunately i have to give up on this issue, for me it's impossible 
> >>>> > to bisect this
> >>>> > issue with my present git-foo.
> >>>> >
> >>>> > The first try with bisection of the whole xen-tree seems to have hit 
> >>>> > the issue that the
> >>>> > qemu-revision that gets pulled on a fresh build is "master" during the 
> >>>> > whole
> >>>> > dev period. That creates havoc when trying to bisect, since you are 
> >>>> > testing
> >>>> > combinations that were never developed (nor auto tested) in that 
> >>>> > combination
> >>>> > (especially when a xen-tree and qemu-tree change have a dependency 
> >>>> > like Roger's
> >>>> > "xen: fix usage of xc_domain_create in domain builder")
> >>>> >
> >>>> > While trying to bisect only qemu (keeping xen itself on RELEASE-4.6.0 
> >>>> > and
> >>>> > seabios on rel-1.8.2) it get stuck on issues with that tree.
> >>>> > Between 4.6.0 and 4.7.0 the qemu tree switched from 
> >>>> > git://xenbits.xen.org/qemu-upstream-4.6-testing.git
> >>>> > to git://xenbits.xen.org/qemu-xen.git),after that there seem to have
> >>>> > been a lot of merges going back and forth and to me it seems a mess 
> >>>> > (but as i
> >>>> > said it could also be a lack of git-foo). I tried by manual bisecting, 
> >>>> > removing
> >>>> > and cloning trees again etc. but that doesn't suffice, it's all going 
> >>>> > no-where.
> >>>> > (while the known good build (plain RELEASE-4.6.0) always works, so it 
> >>>> > doesn't
> >>>> > seem to be some random problem)
> >>>> >
> >>>>
> >>>> Thanks for trying.
> >>>>
> >>>> > So perhaps some dev can at least verify that the issue is there (since 
> >>>> > 4.7.0)
> >>>> > and put it on the "known broken" list of things.
> >>>> >
> >>>>
> >>>> I will put this into the list of things I need to look at.
> >>>>
> >>
> >>> I investigated this a bit. The root cause is the memory accounting is
> >>> wrong in QEMU. It would try to allocate more ram than allowed. I haven't
> >>> tried to figure out exactly what is wrong, though.
> >>
> >>That confirms what i was thinking in the end, but bisection the
> >>qemu-tree
> >>changes between the xen-4.6.0 and xen-4.7.0 release proved to be pretty
> >>difficult as i explained. So i you have a hunch as to in what code it
> >>should
> >>reside debugging instead of bisecting would probably be better.
> >>(so one of the questions is what changes in the memory accounting when
> >>you
> >>supply the kernel from the host instead of the guest, since booting a
> >>kernel
> >>with grub from within the guest doesn't give any memory accounting
> >>issues.)
> >>
> >>Thanks for investigating !
> >
> >I think I hunted down the offending function.
> >
> >Mind trying this patch for me?
> 
> Hi Wei,
> 
> This seems to help :)
> 
> With a linux 4.8 kernel the HVM guest now boots fine with direct kernel boot
> !
> 
> But there seems to be a gotcha which i think is not in the Xen docs/wiki:
> when trying a linux 4.3 kernel the guest still didn't boot and i got a:
> "qemu: linux kernel too old to load a ram disk" in the qemu log.
> I don't know what qemu regards as "old" in this case.
> 

QEMU checks for a  signature / version in kernel header or whatnot. I
can't tell why that specific number is chosen, though.

> Another considiration: would it be worthwhile to add an OSStest for direct
> kernel boot ?
> (under the assumption that the host kernel that gets build can also boot on
> HVM guest it's probably a very cheap test not requiring any additional
> builds.)

Yes, definitely. The more tests, the merrier.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.