Hi Magnus,
>From: Magnus Damm [mailto:magnus@xxxxxxxxxxxxx]
>Sent: 2006年9月28日 21:12
>To: Isaku Yamahata
>Cc: Xu, Anthony; Tristan Gingold; Alex Williamson;
>xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: Re: [Xen-ia64-devel][PATCH][RFC] Task: support huge page
>RE:[Xen-ia64-devel] Xen/IA64 Healthiness Report -Cset#11460
>
>
>Large contiguous memory chunks - ouch.
>
>I know that Mel Gorman has been working on the Linux side of hugetlbfs
>with both antifragmentation and defragmentation. I think dedicated pools
>for large pages is a simle and good solution - trying to allocate large
>contiguous chunks of memory from a global memory pool without any
>support for defragmentation/migration is begging for trouble IMO.
>
Implementing defragmentation is necessary,
But it is not easy, as I know there several defragmentation algorithms
discussed about hugetlbfs in linux kernel, until now, there is no one algorithm
overwhelming.
Maybe we can postpone this.
Creating/Destroying domain frequently in XEN/IA64 may be not the case.
It may be not important as it seems in XEN/IA64.
>>
>> I agree with you that supporting tlb insert with large page size and
>> hugetlbfs would be a big gain.
>
>Yes the theory (and your numbers) all say that larger page sizes are good
>for performance. I think page table code is fun too, so I must admit
>that I'm a bit tempted.
That's great!
>
>I'm not sure of the current status of Xen and N-order allocations, but I
>have a feeling that it is possible to keep the complexity low and still
>get some performance improvement by limiting the scope to some kind of
>dedicated huge page pool and a PV interface to hugetlbfs.
Dedicated huge page pool may be a good idea for linux kernel.
As for XEN/IA64, Xen doesn't know which address space should be contiguous for
guest.
>From guest view, all the physical address is contiguous.
The simplest method is to allocate enough big contiguous memory chunks for
guest,
For instance, in RHEL4-U2, this biggest tlb is 256M, then XEN allocate 256M
contiguous memory chunks for guest.
>
>Or, just make sure that each dom0 is loaded into contiguous memory and
>do copy instead of flipping.
>
>> > In my mind, we need do below things (there may be more) if we want to
>> > support huge page.
>> > 1. Add an option "order" in configure file vtiexample.vti. if order=0,
>> > XEN/IA64 allocate 16K contiguous memory for domain, if order=1, allocate
>> > 32K, and so on. Thus user can chose page size for domain.
>>
>> A fall back path should be implemented in case that
>> large page allocation fails.
>> Or do you propose introducing new page allocator with very large chunk?
>> With order option, page fragmentation should be taken care of.
>
>I think the best soltion would be to divide memory into different pools
>at boot time to avoid fragmentation problems. If large page allocation
>fails when creating a domain - then just return error telling the user
>that he has misconfigured his system.
That is exactly what I'm thinking,
Again, we can use buddy system to allocate huge page first,
Then turn to dedicated pools method.
In fact, these two parts can be developed in the same time
>> >
>> > 6. Ballon driver may need to be modified to increase or decrease domain
>> > memory by page size not 16K.
>> >
>> > Magnus, would you like to take this task?
>
>I'm currently busy with VT-extension work for kexec on x86 and x86_64,
>and on top of that I feel that it may be too complex for me. Especially
>since I have very little ia64 experience.
>
>I'm interested in doing the x86 side of it though if someone else drives
>the ia64 bit.
>
That's Ok.
As Ian talked in Xen summit, XEN/IA32 may need to support huge page, there are
also huge pages in IA32, 4M and 2M.
>> kernel mapping with large page size.
>> * page fragmentation should be addressed.
>> Both 16KB and 16MB page should be able to co-exist in a same domain.
>> - Allocating large contiguous region might fail.
>> So fall back path should be implemented.
>> - domain should be able to have pages with both page size (16KB and 16MB)
>> for smooth code merge.
>> probably a new bit of the p2m entry, something like _PAGE_HPAGE,
>> would be introduce to distinguish large page from normal page.
>
>On PV it may be possible to setup two address spaces for the kernel -
>one using huge pages and another one with smaller pages. Then the area
>with huge pages is used for read-only operation and other activites such
>as page flipping can be performed in the small page area.
I have below concern,
For instance, there are 4G in PV, 2G for huge pages, and 2G for smaller pages.
That means PV can only use 2G to huge pages, and VBD/VNIF can only use 2G, even
though there are 4G.
>
>>
>> * paravirtualized driver(VBD/VNIF)
>> This is a really issue.
>> For first prototype it is reasonable to not support page flipping
>> resorting grant table memory copy.
>>
>> There are two kinds of page flipping, page mapping and page transfer.
>> I guess page mapping should be supported somehow assuming only dom0
>> (or driver domain) maps.
>> We should measure page flipping and memory copy before giving it a try.
>> I have no figures about it.
>> I'm not sure which has better-performance.
>> (I'm biased. I know that vnif analysis on xen/x86.
>> It said memory copy was cheaper on x86 than page flipping...)
>> If dom0 does only DMA, I/O request can be completed without copy and tlb
>> flush for VBD with tlb tracking patch.
>> Page transfer is difficult. I'm not sure that it's worth while to support
>> page transfer because I'm suffering in optimize it.
>>
>> Another approach is
>> * increase xen page size.
>> Probably simply increasing page size wouldn't work well.
>> In that case, increase only domheap page size,
>> Or introduce new zone like MEMZONE_HPAGE,
>> Or introduce specialized page allocator for it.
>
>Thanks,
>
>/ magnus
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|