Hi Anthony & Dan,
We have finished 125 times KB with your latest patch, and didn't found
the segmentation fault again. Due to May holiday coming, I couldn't continue to
run more times to verify this patch in next week. :)
Thanks
-Xiantao
> -----Original Message-----
> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Magenheimer,
> Dan (HP Labs Fort Collins)
> Sent: 2006年4月29日 21:58
> To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> Williamson, Alex (Linux Kernel Dev)
> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>
> Hi Anthony --
>
> With both Tristan's stability patch and your earlier patch,
> I have completed 103 linux compiles now with no segfaults
> yet. Did you see your segfault with Tristan's patch
> included?
>
> I'll continue running over the weekend with the bits I
> have but if I see a segfault I will add in the additional
> store in Xen entry (minstate.h) from your newer patch.
>
> Dan
>
> > -----Original Message-----
> > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> > Sent: Saturday, April 29, 2006 12:03 AM
> > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
> > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux
> > Kernel Dev)
> > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >
> > Hi Dan,
> >
> > Yes, we also got a segmentation fault in 1 run out of 30.
> >
> > Could you please try this new patch?
> >
> > Thanks,
> > -Anthony
> >
> > >-----Original Message-----
> > >From: Magenheimer, Dan (HP Labs Fort Collins)
> > [mailto:dan.magenheimer@xxxxxx]
> > >Sent: 2006年4月28日 22:49
> > >To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> > >Williamson, Alex (Linux Kernel Dev)
> > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >
> > >Hi Anthony --
> > >
> > >I tried your patch overnight and still got a segmentation
> > >fault in 1 run out of 50. I didn't try Tristan's patch yet,
> > >so will try both at the same time next... perhaps there
> > >are two different problems that show up as the segmentation
> > >fault.
> > >
> > >Dan
> > >
> > >> -----Original Message-----
> > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> > >> Sent: Thursday, April 27, 2006 9:19 PM
> > >> To: Xu, Anthony; Tristan Gingold;
> > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
> > >> Fort Collins); Williamson, Alex (Linux Kernel Dev)
> > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >>
> > >> Hi Tristan,
> > >> Could you please check whether this patch address RSE issue?
> > >>
> > >> Yes, Intel QA team is doing the test in the meantime.
> > >>
> > >>
> > >> Thanks,
> > >> -Anthony
> > >>
> > >> >-----Original Message-----
> > >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> > >> Behalf Of Xu, Anthony
> > >> >Sent: 2006?4?28? 9:48
> > >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> > >> Magenheimer, Dan (HP
> > >> >Labs Fort Collins); Alex Williamson
> > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >> >
> > >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> > >> Behalf Of Tristan
> > >> >>Gingold
> > >> >>Sent: 2006?4?27? 23:14
> > >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
> > >> (HP Labs Fort
> > >> >>Collins); Alex Williamson
> > >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
> > >> >>
> > >> >>Hi,
> > >> >>
> > >> >>as reported earlier, this patch seems to improve stability:
> > >> crashes are at
> > >> >>least more coherent and maybe less frequent.
> > >> >>
> > >> >>RSE handling seems to have a bug: crahes are now due to
> > >> either a bad value in
> > >> >>a stacked register or a use of an invalid stacked register
> > >> (although cfm
> > >> >>seems correct in gdb!)
> > >> >
> > >> >I'm looking at this too,
> > >> >Yes there is a bug about handle_lazy_cover.
> > >> >
> > >> >void ia64_do_page_fault (unsigned long address, unsigned
> > >> long isr, struct
> > >> >pt_regs *regs, unsigned long itir)
> > >> >{
> > >> > unsigned long iip = regs->cr_iip, iha;
> > >> > // FIXME should validate address here
> > >> > unsigned long pteval;
> > >> > unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL);
> > >> > IA64FAULT fault;
> > >> >
> > >> > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
> > >> isr, regs)) return;
> > >> >
> > >> >This code sequence is intended to handle following scenario.
> > >> >
> > >> >1. Guest executes br.ret, this may cause mandatory RSE load,
> > >> and this load may
> > >> >cause TLB miss.
> > >> >2. VMM gets control, but VMM can't handle this TLB miss
> > >> itself, then VMM injects
> > >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
> > >> to jump to Guest
> > >> >TLB miss handler, this TLB miss happens again.
> > >> >3. At this time, interrupt_collection_enabled is 0, so
> > >> handle_lazy_cover
> > >> >executes "cover" on behalf of Guest, and return to Guest TLB
> > >> miss handler again,
> > >> >this time there is no TLB miss.
> > >> >
> > >> >
> > >> >Following code sequence is in ia64_leave_kernel path with
> > >> psr.ic and psr.i off.
> > >> >When br.ret.dptk.many b0 is executed, there may be a
> > >> mandatory load, thus
> > >> >There may be a tlb miss, according to above description
> > >> handle_lazy_cover
> > >> >executes "cover" on behalf of Guest and return to Guest,
> > >> this is no correct
> > >> >in this scenario.
> > >> >
> > >> >I didn't find an easy way to fix this bug.
> > >> >
> > >> >
> > >> > mov loc6=0
> > >> > mov loc7=0
> > >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
> > >> > ;;
> > >> > mov loc8=0
> > >> > mov loc9=0
> > >> > cmp.ne pReturn,p0=r0,in1 // if recursion count
> > >> != 0, we need to do a
> > >> >br.ret
> > >> > mov loc10=0
> > >> > mov loc11=0
> > >> >(pReturn) br.ret.dptk.many b0
> > >> >#endif /* !CONFIG_ITANIUM */
> > >> ># undef pRecurse
> > >> ># undef pReturn
> > >> > ;;
> > >> > alloc r17=ar.pfs,0,0,0,0 // drop current register frame
> > >> > ;;
> > >> > loadrs
> > >> >
> > >> >Thanks,
> > >> >Anthony
> > >> >
> > >> >
> > >> >>
> > >> >>Tested by doing many linux kernel compilation in SMP
> > domU (> 100).
> > >> >>
> > >> >>Tristan.
> > >> >
> > >> >_______________________________________________
> > >> >Xen-ia64-devel mailing list
> > >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > >> >http://lists.xensource.com/xen-ia64-devel
> > >>
> >
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|