It appears that the patch below has created some instability
in domain0. I regularly see a crash now in domain0 when
compiling linux. I changed back to the old code and the
crash seems to go away. Since it is unpredictable, I
changed back to the new code AND added printfs around
the new code in vcpu_translate and domain0 fails immediately after
the printf (but ONLY when it is called from ia64_do_page_fault...
its OK when called from vcpu_tpa).
The attached patch returns stability to the system. It
is definitely not a final patch (for example it's not
SMP-safe), but I thought I would
post it if anybody is trying to get some work done and
domain0 keeps crashing intermittently.
Kevin, John, I still haven't succesfully reproduced your
multi-domain success, so please try this patch with
the second domain.
Thanks,
Dan
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> Sent: Friday, September 02, 2005 8:18 AM
> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [PATCH] Patch to make latest hg multi-domain back to work
>
> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg
> (Rev 6461), where sometimes I can login into xenU shell, sometimes
> pending after "Mounting root fs...", and even sometimes the
> whole system
> is broken as following:
>
> (XEN) ia64_fault: General Exception: IA-64 Reserved
> Register/Field fault
> (data access): reflecting
> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off,
> delivering
> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000
> 000007fdfd
> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE
> FOR DEBUGGING
> (XEN) BUG at domain.c:311
> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000
> (XEN)
>
> Finally I found the root cause is that match_dtlb should return guest
> pte instead of machine pte, because translate_machine_pte will be
> invoked always after vcpu_translate. Translate_machine_pte assumes to
> accept a guest pte and will walk 3 level tables to get machine frame
> number. Why does it happen so scare?
> - For xen0, guest pfn == machine pfn, so nothing happen
> - For xenU, currently there's only one vtlb entry to cache
> latest inserted TC entry. Say current vtlb entry for VA1 has been
> inserted into machine TLB. Normally there'll be many itc issued before
> machine TC for VA1 is purged. Those insertion will change single vtlb
> entry. So in 99.99% case, once guest va is purged out of machine
> TLB/vhpt and trigger TLB miss again, match_tlb will fail.
>
> But there's also corner case where vtlb entry has not been updated but
> the machine TC entry for VA1 has been purged. In this case,
> if trying to
> access that VA1 immediately, match_dtlb will return true and then
> problematic code becomes the murderer.
>
> For example, sometimes I saw:
> (XEN) translate_domain_pte: bad mpa=000000007f170080 (>
> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056
> 1,itir=000
> 0000000000038
> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000
> Above access happens when vcpu_translate tries to access guest SVHPT.
> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is
> passed into translate_machine_pte, warning shows and it's
> finally mapped
> into machine pfn 0. (Maybe we can change such error condition
> to panic,
> instead of return incorrect pfn)
>
> Then things all went weird:
> (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (>
> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef
> 3,itir=000
> 0000000026238
> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000
>
> And finally GP fault happens. This error has actually hidden
> behind for
> a long time, but seldom triggered.
>
> John, please make a test on your side with all the patches I sent out
> today (Including the max_page one). I believe we can call it
> an end now.
> ;-)
>
> BTW, Dan, there's two heads on current xen-ia64-unstable.hg.
> Please do a
> merge.
>
> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx>
>
> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c
> --- a/xen/arch/ia64/xen/vcpu.c Thu Sep 1 21:51:57 2005
> +++ b/xen/arch/ia64/xen/vcpu.c Fri Sep 2 21:30:01 2005
> @@ -1315,7 +1315,8 @@
> /* check 1-entry TLB */
> if ((trp = match_dtlb(vcpu,address))) {
> dtlb_translate_count++;
> - *pteval = trp->page_flags;
> + //*pteval = trp->page_flags;
> + *pteval = vcpu->arch.dtlb_pte;
> *itir = trp->itir;
> return IA64_NO_FAULT;
> }
>
> Thanks,
> Kevin
>
match_dtlb_take2
Description: match_dtlb_take2
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|