This patch is to enable hash vtlb on para domains.
The kernel build time is about 2040s without this patch.
The kernel build time is about 2085s with this patch.
Means this patch loses 2% performance.
But this is due to below two reasons.
1. This patch enables dom0 support non-contiguous memory, though
the memory allocated to dom0 is contiguous. No tlbs with
page size > 16K will be inserted into machine TLB cache.
2. Fully emulate itc instruction to fix potential issue.
Previously emulation of itc is to only insert one 16k tlb
into VHPT without purging VHPT. The issue comes up when guest
inserting a >16k tlb mapping, the old tlb mapping is not purged.
Why this issue doesn’t comes up, the reason is all tlb mappings
With page size> 16K are identity mappings, there is no mappings change
and mapping attributes change.
But if considering hugetlb, the issue pops up. See below scenario.
1. A process uses hugetlb to map a file, and create a child process
which shares this memory block.
2. Linux kernel uses copy-on-write to handle this sharing, that means
at this time this hugetlb is readonly for child process.
3. The child processe may read this memory block, which cause many 16k
Tlbs with readonly attribute inserted into VHPT.
4. Then one processes may write this memory block, that will cause a
Hugetlb with r/w attribute is inserted, according the emulation of itc, only
one 16k tlb is inserted to VHPT without VHPT purge. So many old tlbs with
readonly attribute still reside in VHPT. When child process accesses memory
with readonly attribute, a ACCESS_RIGHT fault is delivered to linux kernel,
The linux kernel get confused, this area has already been r/w attribute, why
there is ACCESS_RIGHT happening on this area.
I don't know the exact result, but this is definitely not correct.
Another issue about emulation of itc is, hypervisor should check if there are
guest trs which is overlapped with this mapping, if yes, mca happens on guest
OS.
Adding above handlings, hypervisor can fully virtualizes itc instruction.
Moreover, this patch implements collision chain of long VHPT, there are
many spaces we can tune the performance.
1. What should the ratio of memory space of hash table and memory space of
Collision chain?
Current is 1:1.
2. What's the max collision chain length?
Current is 15.
3. How to cycle collision chain?
Current implementation is cycle all collision chain.
4. What's the best way of mangling rid?
Current we exchange 1,3 byte.
.....
Comments welcome
Signed-off-by: Anthony Xu <anthony.xu@xxxxxxxxx>
Thanks,
-Anthony
enable_hash_vtlb_0407.diff
Description: enable_hash_vtlb_0407.diff
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|