RE: [Xen-ia64-devel][PATCH][RFC] Task: support huge page RE: [Xe

To:	"Isaku Yamahata" <yamahata@xxxxxxxxxxxxx>
Subject:	RE: [Xen-ia64-devel][PATCH][RFC] Task: support huge page RE: [Xen-ia64-devel] Xen/IA64 Healthiness Report -Cset#11460
From:	"Xu, Anthony" <anthony.xu@xxxxxxxxx>
Date:	Fri, 29 Sep 2006 10:28:36 +0800
Cc:	Magnus Damm <magnus@xxxxxxxxxxxxx>, xen-ia64-devel@xxxxxxxxxxxxxxxxxxx, Tristan Gingold <Tristan.Gingold@xxxxxxxx>
Delivery-date:	Thu, 28 Sep 2006 19:29:24 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id:	Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post:	<mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	Acbi1ToWW00u1tyfQyO7wMiCBrb3TwAkT60w
Thread-topic:	[Xen-ia64-devel][PATCH][RFC] Task: support huge page RE: [Xen-ia64-devel] Xen/IA64 Healthiness Report -Cset#11460

Hi Isaku and all,

Thanks for you comments

I am glad that seems all agree that huge support is needed in XEN/IA64.
Then the problem is how to implement it.

See my comments below.

Thanks,
Anthony


>From: Isaku Yamahata [mailto:yamahata@xxxxxxxxxxxxx]
>Sent: 2006年9月28日 16:08
>To: Xu, Anthony
>Cc: Magnus Damm; Tristan Gingold; Alex Williamson;
>xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> In my mind, we need do below things (there may be more) if we want to
>> support huge page.
>> 1. Add an option "order" in configure file vtiexample.vti. if order=0,
>> XEN/IA64 allocate 16K contiguous memory for domain, if order=1, allocate
>> 32K,  and so on. Thus user can chose page size for domain.
>
>A fall back path should be implemented in case that
>large page allocation fails.
>Or do you propose introducing new page allocator with very large chunk?
>With order option, page fragmentation should be taken care of.

Page allocator is still based on 16K page, otherwise will impact small memory 
allocation, such as xmalloc, and there are many places need to be modified.

So I think the first step is
Page allocator still use 16K page, but we allocate huge page for domain.

As for allocation failure, the first step is, if allocation fails, creating 
domain fails,
The next step is to defragmentation.
  



>>
>> 4. Per_LP_VHPT may need to be modified to support huge page.
>
>Do you mean hash collision?

I don't mean hash collision.
Per_LP_VHPT is long format VHPT, it can support huge page essentially, but many 
code about VHPT assume page size is 16K, so itir.ps is always 16K in some code 
sequence.


>
>
>> 5. VBD/VNIF may need to be modified to use copy mechanism instead of
>> flipping page.
>>
>> 6. Ballon driver may need to be modified to increase or decrease domain
>> memory by page size not 16K.
>>
>> Magnus, would you like to take this task?
>>
>> Comments are always welcome.
>
>Those are my some random thoughts.
>
>* Presumably there are two goals
>  - Support one large page size(e.g. 16MB) to map kernel.
>  - Support hugetlbfs whose page size might be different from 16MB.
>
>  I.e. support three page sizes, normal page size 16KB, kernel mapping
>  page size 16MB and hugetlbfs page size 256MB.
>  I think hugetlbfs support can be addressed specialized way.

Kernel is using 16M identity mapping, and rr7.ps=16M, so if Xen allocate 16M 
contiguous chunks for domain, then Xen can set machine rr7.ps 16M instead of 
16K, then all VHPT entries for region 7 in Per_LP_VHPT is 16M page size.

I'm using rhel4-u2 as guest, by default, rhel4-u2 set rr4.ps=256M.

For latest kernel who supports hugetlbfs, the biggest page size is 4G.

The goals from me is supporting 256M, if we can do that, and then supporting 
huger tlb like 1G or 4G is trivial. :-)



>
>hugetlbfs
>* Some specialized path can be implemented to support hugetlbfs.
>  - For domU
>    paravirtualize hugetlbfs for domU.
>    Hook to alloc_fresh_huge_page() in Linux. Then xen/ia64 is aware of
>    large pages.
>    Probably a new flag of the p2m entry, or other data structure might be
>    introduced.
>    For xenLinux, the region number, RGN_HPAGE can be used to check before
>    entering hugetlbfs specialized path.

That's good, but first Xen need to allocate contiguous chunks for domU

>  - For domVTI
>    Can the use of hugetlbfs be detected somehow?

In domVTI side, Xen don't know hugetlbfs, but Xen can capture guest accessing 
rr,
If a new preferred page size is set ( rr.ps),( preferred page size means this 
page size is used mostly in this region). If Xen can set the same preferred 
page size into machine rr.ps, that's great, most tlb miss can be handled by 
assembly code, that means it can be found in long format VHPT, otherwise Xen 
need to lookup VTLB in C code

>    Probably some Linux-specific heuristic can be used.
>    e.g. check the region, RGN_HPAGE.
>
>kernel mapping with large page size.
>* page fragmentation should be addressed.
>  Both 16KB and 16MB page should be able to co-exist in a same domain.
>  - Allocating large contiguous region might fail.
>    So fall back path should be implemented.
>  - domain should be able to have pages with both page size (16KB and 16MB)
>    for smooth code merge.
>  probably a new bit of the p2m entry, something like _PAGE_HPAGE,
>  would be introduce to distinguish large page from normal page.
>
>* paravirtualized driver(VBD/VNIF)
>  This is a really issue.
>  For first prototype it is reasonable to not support page flipping
>  resorting grant table memory copy.
>
>  There are two kinds of page flipping, page mapping and page transfer.
>  I guess page mapping should be supported somehow assuming only dom0
>  (or driver domain) maps.
>  We should measure page flipping and memory copy before giving it a try.
>  I have no figures about it.
>  I'm not sure which has better-performance.
>  (I'm biased. I know that vnif analysis on xen/x86.
>   It said memory copy was cheaper on x86 than page flipping...)
>  If dom0 does only DMA, I/O request can be completed without copy and tlb
>  flush for VBD with tlb tracking patch.
>  Page transfer is difficult. I'm not sure that it's worth while to support
>  page transfer because I'm suffering in optimize it.
>
Totally agree

>Another approach is
>* increase xen page size.
>  Probably simply increasing page size wouldn't work well.
>  In that case, increase only domheap page size,
>  Or introduce new zone like MEMZONE_HPAGE,
>  Or introduce specialized page allocator for it.
>
We need to consider how to allocate small memory from a big page size buddy 
system.


Thanks,
Anthony
>--
>yamahata

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

WARNING - OLD ARCHIVES

xen-ia64-devel

RE: [Xen-ia64-devel][PATCH][RFC] Task: support huge page RE: [Xen-ia64-d