[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m



On 01/13/2015 12:45 PM, Andrew Cooper wrote:
> On 13/01/15 20:02, Ed White wrote:
>> On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>>> On 09/01/15 21:26, Ed White wrote:
>>>> This set of patches adds support to hvm domains for EPTP switching by 
>>>> creating
>>>> multiple copies of the host p2m (currently limited to 10 copies).
>>>>
>>>> The primary use of this capability is expected to be in scenarios where 
>>>> access
>>>> to memory needs to be monitored and/or restricted below the level at which 
>>>> the
>>>> guest OS page tables operate. Two examples that were discussed at the 2014 
>>>> Xen
>>>> developer summit are:
>>>>
>>>>     VM introspection: 
>>>>         http://www.slideshare.net/xen_com_mgr/
>>>>         zero-footprint-guest-memory-introspection-from-xen
>>>>
>>>>     Secure inter-VM communication:
>>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>>>
>>>> Each p2m copy is populated lazily on EPT violations, and only contains 
>>>> entries for
>>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in 
>>>> a similar
>>>> way to the existing memory access interface, and gfn->mfn mappings can be 
>>>> changed.
>>>>
>>>> All this is done through extra HVMOP types.
>>>>
>>>> The cross-domain HVMOP code has been compile-tested only. Also, the 
>>>> cross-domain
>>>> code is hypervisor-only, the toolstack has not been modified.
>>>>
>>>> The intra-domain code has been tested. Violation notifications can only be 
>>>> received
>>>> for pages that have been modified (access permissions and/or gfn->mfn 
>>>> mapping) 
>>>> intra-domain, and only on VCPU's that have enabled notification.
>>>>
>>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>>>
>>>> This code is not compatible with nested hvm functionality and will refuse 
>>>> to work
>>>> with nested hvm active. It is also not compatible with migration. It 
>>>> should be
>>>> considered experimental.
>>> Having reviewed most of the series, I believe I now have a feeling for
>>> what you are trying to achieve, but I would like to discuss some of the
>>> design implications.
>>>
>>> The following is my understanding of the situation.  Please correct me
>>> if I have made a mistake.
>>>
>>>
>> Thanks for investing the time to do this. Maybe this first couple of days
>> would have gone more smoothly if something like this was in the cover letter.
> 
> No problem.  (I tend to find that things like this save time in the long
> run)
> 
>>
>> With the exception of a couple of minor points, you are spot on.
> 
> Cool!
> 
>>
>>> Currently, a domain has a single host p2m.  This contains the guest
>>> physical address mappings, and a combination of p2m types which are used
>>> by existing components to allow certain actions to happen.  All vcpus
>>> run with the same host p2m.
>>>
>>> A domain may have a number of nested p2ms (currently an arbitrary limit
>>> of 10).  These are used for nested-virt and are translated by the host
>>> p2m.  Vcpus in guest mode run under a nested p2m.
>>>
>>> This new altp2m infrastructure adds the ability to use a different set
>>> of tables in the place of the host p2m.  This, in practice, allows for
>>> different translations, different p2m types, different access permissions. 
>>>
>>> One usecase of alternate p2ms is to provide introspection information to
>>> out-of-guest entities (via the mem_event interface) or to in-guest
>>> entities (via #VE).
>>>
>>>
>>> Now for some observations and assumptions.
>>>
>>> It occurs to me that the altp2m mechanism is generic.  From the look of
>>> the series, it is mostly implemented in a generic way, which is great. 
>>> The only Intel specific bits appear to be the ept handling itself,
>>> 'vmfunc' instruction support and #VE injection to in-guest entities. 
>>>
>> That was my intention. I don't know enough about the state of AMD
>> virtualization to know if it can support these patches by emulating
>> vmfunc and #VE, but that was my target.
> 
> As far as I am aware, AMD SVM has no similar concept to vmfunc, nor
> #VE.  However, the same kinds of introspection are certainly possible by
> playing with the read/write bits on the NPT tables and causing a vmexit.
> 
>>
>>> I can't think of any reasonable case where the alternate p2m would want
>>> mappings different to the host p2m.  That is to say, an altp2m will map
>>> the same set of mfns to make a guest physical address space, but may
>>> differ in page permissions and possibly p2m types.
>>>
>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>> modified under certain circumstances. One use of this is to point the
>> same VA to different physical pages (with different access permissions)
>> in different p2m's to hide memory changes.
> 
> What is the practical use of being able to play paging tricks like this
> behind a VMs back?
> 

I'm restricted in how much detail I can go into on a public mailing list,
but imagine that you want a data read to see one thing and an instruction
fetch to see something else.

If you need more than that we'll have to go off-list, and even then I'll
have to check what I can say.

Ed

>>
>>> Given the above restriction, I believe a lot of the existing features
>>> can continue to work and coexist.  For generating mem_events, the
>>> permissions can be altered in the altp2m.  For injecting #VE, the altp2m
>>> type can change to the new p2m_ram_rw, so long as the host p2m type is
>>> compatible.  For both, a vmexit can occur.  Xen can do the appropriate
>>> action and also inject a #VE on its way back into the guest.
>>>
>>> One thing I have noticed while looking at the #VE stuff that EPT also
>>> supports A/D tracking, which might be quite a nice optimisation and
>>> forgo the need for p2m_ram_logdirty, but I think this should be treated
>>> as an orthogonal item.
>>>
>> This is far from my area of expertise, but I believe there is code in Xen
>> to use EPT D bits in migration.
> 
> Not that I can spot, although I seem to remember some talk about it. All
> logdirty code still appears to relies on the logdirty bitmap being
> filled, which is done from vmexits for p2m_ram_logdirty regions.
> 
> ~Andrew
> 
>>
>> Ed
>>
>>> When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
>>> as this will not interfere with the IOMMU permissions.
>>>
>>> Furthermore, I can't conceptually think of an issue against the idea of
>>> nestedp2m alternatives, following the same rule that the mapped mfns
>>> match up.  That should allow all existing nestedvirt infrastructure
>>> continue to work.
>>>
>>> Does the above look sensible, or have I overlooked something?
>>>
>>> ~Andrew
>>>
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.