Hi. Good catch. Some comments.
I attached two patches to fix, could you try them?
- bss.page_aligned.
Where is the section used?
grep didn't tell me. Surely x86 uses .bss.page_aligned in
linux/arch/[i386, x86_64]/kernel/head[-xen].S,
but no files unuder linux/arch/ia64/ don't use it.
- ia64_fast_eoi.
I suppose ia64_fast_eoi is used for optimization instead of
PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though.
Anyway ia64_fast_eoi hypercall implementation should also be updated
which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support.
thanks,
On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
> Hi, Isaku & All
> The attached patch should fix the weird issue. In upstream, we also find
> some other weird issues, for example, we can't boot dom0 on some platforms,
> and dom0 may have different behavior with different initrds. After debug, I
> found it should be caused by incorrect setting for pirq_needs_eoi page.
> There are two main issues found during the debug:
> 1. the related two hypercalls are not enabled in the correct way, so dom0
> and hypervisor doesn't have the agreement on which pirq needs EOI.
> 2. the page is not really linked to bss section even if this is the must, so
> kernel deems it as memory cache and uses it for many ways, and finally leads
> to varid issues.
> Thanks
> Xiantao
>
>
>
> You, Yongkang wrote:
> >> I tried 2048M (and other value), but I wasn't reproduce it.
> >> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
> >> tested?
> >
> > Isaku/All,
> >
> > This issue is really very hard to locate. Now I am a little
> > suspecting it is related with building process, as if changing
> > building method, this issue is gone too.
> >
> > 1, It doesn't happen to all machines. But it can be stably reproduce
> > in our nightly test machine with same binary. 2, When system
> > crashing, dom0_mem is set to 2048M. And if using other memory size,
> > this issue disappeared too. 3, It seems happened between dom0
> > changeset 743~753, as it workds if we use old built Dom0 kernel + new
> > Xen. And the old nightly testing doesn't have issue. 4, When I try to
> > do regression testing between 743~753, I found different build method
> > might cause crashing and non-crashing.
> >
> > In our default building process, as stubdomain is not supported in
> > IA64, so we removed install-stubdom and dist-stubdom from "install:"
> > and "dist:" lines in main Makefile. It has been changed more than 2
> > months. The real compiling command is "make -j3 >xyz_file". And the
> > crashing issue is related with building process.
> >
> > When I do regression testing, sometimes I didn't change Makefile, but
> > still use "make -j3". Then the crashing is gone.
> >
> > I am not sure if my suspection is possible, as it still need more
> > trying. Compiling Dom0 is not easy like Xen. It is costing. I would
> > try to do more, but maybe not so quick, as many another things need
> > to do at the same time. If the default compilation is okay, do you
> > think it is worthy to do more investigating?
> >
> > Any suggestion will be much appreciated.
> >
> > Best Regards,
> > Yongkang You
> >
> > On Tuesday, December 16, 2008 10:22 AM, "Isaku Yamahata" wrote:
> >
> >> On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
> >>> On Monday, December 08, 2008 2:10 PM, "Isaku Yamahata" wrote:
> >>>
> >>>> On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
> >>>>> Isaku Yamahata wrote:
> >>>>>> On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
> >>>>>>> Hi Isaku,
> >>>>>>> We re-get the detail information from serial port, please
> >>>>>>> see below. Two comments add:
> >>>>>>
> >>>>>> Thank you.
> >>>>>>
> >>>>>>
> >>>>>>> 1. We can be sure the Cset#18832 works well on the same
> >>>>>>> tiger4 machine. But we did not do regression test between 18832
> >>>>>>> and this 18860.
> >>>>>>> 2. It is strange that on another Tiger4 box, dom0 will NOT
> >>>>>>> crash. Do you have any idea from the serial log? Thanks!
> >>>>>>
> >>>>>> I haven't hit this crash. And Kuwamura-san's test seems that
> >>>>>> he haven't hit it either. Kuwamura-san, is it correct?
> >>>>>> Hmm... it seems to depend on hw configuration?
> >>>>>> I'm inclined to suspect masking/unmasking interruption race.
> >>>>>> event channel issues? But that's just only my very vague guess.
> >>>>>>
> >>>>>> The difference between 18832 and 18860 means the merging
> >>>>>> xen-unstable into xen-ia64-unstable. Looking the log, I suspect
> >>>>>> linux-2.6.18-xen instead of xen.
> >>>>>> Could you provide the linux c/s which corresponds to 18832 and
> >>>>>> 18860?
> >>>>>
> >>>>>
> >>>>> Hi Isaku,
> >>>>> Yes, some of our machines do not crash. I am afraid there may
> >>>>> be some potential issue. By testing 18832, we use linux#742.
> >>>>> While 18860 uses linux#753. Thanks!
> >>>>
> >>>> Thank you. Taking rough look at them those change sets doesn't seem
> >>>> culprit. I agree with you that this may indicate some potential
> >>>> bugs...
> >>>
> >>> Hi All,
> >>>
> >>> This bug is stably reproduced, if providing "dom0_mem=2048M" in
> >>> append option. And if setting dom0_mem to 1024M or 4096M, the
> >>> crashing doesn't happen.
> >>>
> >>> We tried #18869 Xen + #742 Dom0, system is okay. So the problem
> >>> might be in Linux tree between #742~#753
> >>
> >> I tried 2048M (and other value), but I wasn't reproduce it.
> >> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
> >> tested?
> >>
> >> thanks,
> >
> > _______________________________________________
> > Xen-ia64-devel mailing list
> > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-ia64-devel
>
> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-ia64-devel
--
yamahata
ia64-fast-eoi.patch
Description: Text Data
fix_pirq_eoi_page.patch
Description: Text Data
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|