Still work for me.
Thanks,
Kevin
>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2005年9月8日 4:57
>To: Tian, Kevin; Byrne, John (HP Labs)
>Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: [PATCH] Patch to make latest hg multi-domain back to work
>
>It appears that the patch below has created some instability
>in domain0. I regularly see a crash now in domain0 when
>compiling linux. I changed back to the old code and the
>crash seems to go away. Since it is unpredictable, I
>changed back to the new code AND added printfs around
>the new code in vcpu_translate and domain0 fails immediately after
>the printf (but ONLY when it is called from ia64_do_page_fault...
>its OK when called from vcpu_tpa).
>
>The attached patch returns stability to the system. It
>is definitely not a final patch (for example it's not
>SMP-safe), but I thought I would
>post it if anybody is trying to get some work done and
>domain0 keeps crashing intermittently.
>
>Kevin, John, I still haven't succesfully reproduced your
>multi-domain success, so please try this patch with
>the second domain.
>
>Thanks,
>Dan
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> Sent: Friday, September 02, 2005 8:18 AM
>> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: [PATCH] Patch to make latest hg multi-domain back to work
>>
>> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg
>> (Rev 6461), where sometimes I can login into xenU shell, sometimes
>> pending after "Mounting root fs...", and even sometimes the
>> whole system
>> is broken as following:
>>
>> (XEN) ia64_fault: General Exception: IA-64 Reserved
>> Register/Field fault
>> (data access): reflecting
>> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off,
>> delivering
>> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000
>> 000007fdfd
>> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE
>> FOR DEBUGGING
>> (XEN) BUG at domain.c:311
>> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000
>> (XEN)
>>
>> Finally I found the root cause is that match_dtlb should return guest
>> pte instead of machine pte, because translate_machine_pte will be
>> invoked always after vcpu_translate. Translate_machine_pte assumes to
>> accept a guest pte and will walk 3 level tables to get machine frame
>> number. Why does it happen so scare?
>> - For xen0, guest pfn == machine pfn, so nothing happen
>> - For xenU, currently there's only one vtlb entry to cache
>> latest inserted TC entry. Say current vtlb entry for VA1 has been
>> inserted into machine TLB. Normally there'll be many itc issued before
>> machine TC for VA1 is purged. Those insertion will change single vtlb
>> entry. So in 99.99% case, once guest va is purged out of machine
>> TLB/vhpt and trigger TLB miss again, match_tlb will fail.
>>
>> But there's also corner case where vtlb entry has not been updated but
>> the machine TC entry for VA1 has been purged. In this case,
>> if trying to
>> access that VA1 immediately, match_dtlb will return true and then
>> problematic code becomes the murderer.
>>
>> For example, sometimes I saw:
>> (XEN) translate_domain_pte: bad mpa=000000007f170080 (>
>> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056
>> 1,itir=000
>> 0000000000038
>> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000
>> Above access happens when vcpu_translate tries to access guest SVHPT.
>> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is
>> passed into translate_machine_pte, warning shows and it's
>> finally mapped
>> into machine pfn 0. (Maybe we can change such error condition
>> to panic,
>> instead of return incorrect pfn)
>>
>> Then things all went weird:
>> (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (>
>> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef
>> 3,itir=000
>> 0000000026238
>> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000
>>
>> And finally GP fault happens. This error has actually hidden
>> behind for
>> a long time, but seldom triggered.
>>
>> John, please make a test on your side with all the patches I sent out
>> today (Including the max_page one). I believe we can call it
>> an end now.
>> ;-)
>>
>> BTW, Dan, there's two heads on current xen-ia64-unstable.hg.
>> Please do a
>> merge.
>>
>> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx>
>>
>> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c
>> --- a/xen/arch/ia64/xen/vcpu.c Thu Sep 1 21:51:57 2005
>> +++ b/xen/arch/ia64/xen/vcpu.c Fri Sep 2 21:30:01 2005
>> @@ -1315,7 +1315,8 @@
>> /* check 1-entry TLB */
>> if ((trp = match_dtlb(vcpu,address))) {
>> dtlb_translate_count++;
>> - *pteval = trp->page_flags;
>> + //*pteval = trp->page_flags;
>> + *pteval = vcpu->arch.dtlb_pte;
>> *itir = trp->itir;
>> return IA64_NO_FAULT;
>> }
>>
>> Thanks,
>> Kevin
>>
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|