[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][Patches] Xen 1GB Page Table Support



Yes, I might.

 -- Keir


On 18/03/2009 17:45, "Huang2, Wei" <Wei.Huang2@xxxxxxx> wrote:

> Keir,
> 
> Would you consider the middle approach (tools + normal p2m code) for
> 3.4? I understand that 1GB PoD is too big. But the middle one is much
> simpler.
> 
> Thanks,
> 
> -Wei
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: Wednesday, March 18, 2009 12:33 PM
> To: George Dunlap; Huang2, Wei
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Tim Deegan
> Subject: Re: [Xen-devel] [RFC][Patches] Xen 1GB Page Table Support
> 
> I'm not sure about putting this in for 3.4 unless there's a significant
> performance win.
> 
>  -- Keir
> 
> On 18/03/2009 17:20, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx>
> wrote:
> 
>> Thanks for doing this work, Wei -- especially all the extra effort for
>> the PoD integration.
>> 
>> One question: How well would you say you've tested the PoD
>> functionality?  Or to put it the other way, how much do I need to
>> prioritize testing this before the 3.4 release?
>> 
>> It wouldn't be a bad idea to do as you suggested, and break things
>> into 2 meg pages for the PoD case.  In order to take the best
>> advantage of this in a PoD scenario, you'd need to have a balloon
>> driver that could allocate 1G of continuous *guest* p2m space, which
>> seems a bit optimistic at this point...
>> 
>>  -George
>> 
>> 2009/3/18 Huang2, Wei <Wei.Huang2@xxxxxxx>:
>>> Current Xen supports 2MB super pages for NPT/EPT. The attached
> patches
>>> extend this feature to support 1GB pages. The PoD
> (populate-on-demand)
>>> introduced by George Dunlap made P2M modification harder. I tried to
>>> preserve existing PoD design by introducing a 1GB PoD cache list.
>>> 
>>> 
>>> 
>>> Note that 1GB PoD can be dropped if we don't care about 1GB when PoD
> is
>>> enabled. In this case, we can just split 1GB PDPE into 512x2MB PDE
> entries
>>> and grab pages from PoD super list. That can pretty much make
>>> 1gb_p2m_pod.patch go away.
>>> 
>>> 
>>> 
>>> Any comment/suggestion on design idea will be appreciated.
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> 
>>> -Wei
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The following is the description:
>>> 
>>> === 1gb_tools.patch ===
>>> 
>>> Extend existing setup_guest() function. Basically, it tries to
> allocate 1GB
>>> pages whenever available. If this request fails, it falls back to
> 2MB. If
>>> both fail, then 4KB pages will be used.
>>> 
>>> 
>>> 
>>> === 1gb_p2m.patch ===
>>> 
>>> * p2m_next_level()
>>> 
>>> Check PSE bit of L3 page table entry. If 1GB is found (PSE=1), we
> split 1GB
>>> into 512 2MB pages.
>>> 
>>> 
>>> 
>>> * p2m_set_entry()
>>> 
>>> Configure the PSE bit of L3 P2M table if page order == 18 (1GB).
>>> 
>>> 
>>> 
>>> * p2m_gfn_to_mfn()
>>> 
>>> Add support for 1GB case when doing gfn to mfn translation. When L3
> entry is
>>> marked as POPULATE_ON_DEMAND, we call 2m_pod_demand_populate().
> Otherwise,
>>> we do the regular address translation (gfn ==> mfn).
>>> 
>>> 
>>> 
>>> * p2m_gfn_to_mfn_current()
>>> 
>>> This is similar to p2m_gfn_to_mfn(). When L3 entry s marked as
>>> POPULATE_ON_DEMAND, it demands a populate using
> p2m_pod_demand_populate().
>>> Otherwise, it does a normal translation. 1GB page is taken into
>>> consideration.
>>> 
>>> 
>>> 
>>> * set_p2m_entry()
>>> 
>>> Request 1GB page
>>> 
>>> 
>>> 
>>> * audit_p2m()
>>> 
>>> Support 1GB while auditing p2m table.
>>> 
>>> 
>>> 
>>> * p2m_change_type_global()
>>> 
>>> Deal with 1GB page when changing global page type.
>>> 
>>> 
>>> 
>>> === 1gb_p2m_pod.patch ===
>>> 
>>> * xen/include/asm-x86/p2m.h
>>> 
>>> Minor change to deal with PoD. It separates super page cache list
> into 2MB
>>> and 1GB lists. Similarly, we record last gpfn of sweeping for both
> 2MB and
>>> 1GB.
>>> 
>>> 
>>> 
>>> * p2m_pod_cache_add()
>>> 
>>> Check page order and add 1GB super page into PoD 1GB cache list.
>>> 
>>> 
>>> 
>>> * p2m_pod_cache_get()
>>> 
>>> Grab a page from cache list. It tries to break 1GB page into 512 2MB
> pages
>>> if 2MB PoD list is empty. Similarly, 4KB can be requested from super
> pages.
>>> The breaking order is 2MB then 1GB.
>>> 
>>> 
>>> 
>>> * p2m_pod_cache_target()
>>> 
>>> This function is used to set PoD cache size. To increase PoD target,
> we try
>>> to allocate 1GB from xen domheap. If this fails, we try 2MB. If both
> fail,
>>> we try 4KB which is guaranteed to work.
>>> 
>>> 
>>> 
>>> To decrease the target, we use a similar approach. We first try to
> free 1GB
>>> pages from 1GB PoD cache list. If such request fails, we try 2MB PoD
> cache
>>> list. If both fail, we try 4KB list.
>>> 
>>> 
>>> 
>>> * p2m_pod_zero_check_superpage_1gb()
>>> 
>>> This adds a new function to check for 1GB page. This function is
> similar to
>>> p2m_pod_zero_check_superpage_2mb().
>>> 
>>> 
>>> 
>>> * p2m_pod_zero_check_superpage_1gb()
>>> 
>>> We add a new function to sweep 1GB page from guest memory. This is
> the same
>>> as p2m_pod_zero_check_superpage_2mb().
>>> 
>>> 
>>> 
>>> * p2m_pod_demand_populate()
>>> 
>>> The trick of this function is to do remap_and_retry if
> p2m_pod_cache_get()
>>> fails. When p2m_pod_get() fails, this function will splits p2m table
> entry
>>> into smaller ones (e.g. 1GB ==> 2MB or 2MB ==> 4KB). That can
> guarantee
>>> populate demands always work.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
> 
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.