[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

On 13/01/15 20:02, Ed White wrote:
> On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>> On 09/01/15 21:26, Ed White wrote:
>>> This set of patches adds support to hvm domains for EPTP switching by 
>>> creating
>>> multiple copies of the host p2m (currently limited to 10 copies).
>>> The primary use of this capability is expected to be in scenarios where 
>>> access
>>> to memory needs to be monitored and/or restricted below the level at which 
>>> the
>>> guest OS page tables operate. Two examples that were discussed at the 2014 
>>> Xen
>>> developer summit are:
>>>     VM introspection: 
>>>         http://www.slideshare.net/xen_com_mgr/
>>>         zero-footprint-guest-memory-introspection-from-xen
>>>     Secure inter-VM communication:
>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>> Each p2m copy is populated lazily on EPT violations, and only contains 
>>> entries for
>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a 
>>> similar
>>> way to the existing memory access interface, and gfn->mfn mappings can be 
>>> changed.
>>> All this is done through extra HVMOP types.
>>> The cross-domain HVMOP code has been compile-tested only. Also, the 
>>> cross-domain
>>> code is hypervisor-only, the toolstack has not been modified.
>>> The intra-domain code has been tested. Violation notifications can only be 
>>> received
>>> for pages that have been modified (access permissions and/or gfn->mfn 
>>> mapping) 
>>> intra-domain, and only on VCPU's that have enabled notification.
>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>> This code is not compatible with nested hvm functionality and will refuse 
>>> to work
>>> with nested hvm active. It is also not compatible with migration. It should 
>>> be
>>> considered experimental.
>> Having reviewed most of the series, I believe I now have a feeling for
>> what you are trying to achieve, but I would like to discuss some of the
>> design implications.
>> The following is my understanding of the situation.  Please correct me
>> if I have made a mistake.
> Thanks for investing the time to do this. Maybe this first couple of days
> would have gone more smoothly if something like this was in the cover letter.

No problem.  (I tend to find that things like this save time in the long

> With the exception of a couple of minor points, you are spot on.


>> Currently, a domain has a single host p2m.  This contains the guest
>> physical address mappings, and a combination of p2m types which are used
>> by existing components to allow certain actions to happen.  All vcpus
>> run with the same host p2m.
>> A domain may have a number of nested p2ms (currently an arbitrary limit
>> of 10).  These are used for nested-virt and are translated by the host
>> p2m.  Vcpus in guest mode run under a nested p2m.
>> This new altp2m infrastructure adds the ability to use a different set
>> of tables in the place of the host p2m.  This, in practice, allows for
>> different translations, different p2m types, different access permissions. 
>> One usecase of alternate p2ms is to provide introspection information to
>> out-of-guest entities (via the mem_event interface) or to in-guest
>> entities (via #VE).
>> Now for some observations and assumptions.
>> It occurs to me that the altp2m mechanism is generic.  From the look of
>> the series, it is mostly implemented in a generic way, which is great. 
>> The only Intel specific bits appear to be the ept handling itself,
>> 'vmfunc' instruction support and #VE injection to in-guest entities. 
> That was my intention. I don't know enough about the state of AMD
> virtualization to know if it can support these patches by emulating
> vmfunc and #VE, but that was my target.

As far as I am aware, AMD SVM has no similar concept to vmfunc, nor
#VE.  However, the same kinds of introspection are certainly possible by
playing with the read/write bits on the NPT tables and causing a vmexit.

>> I can't think of any reasonable case where the alternate p2m would want
>> mappings different to the host p2m.  That is to say, an altp2m will map
>> the same set of mfns to make a guest physical address space, but may
>> differ in page permissions and possibly p2m types.
> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
> modified under certain circumstances. One use of this is to point the
> same VA to different physical pages (with different access permissions)
> in different p2m's to hide memory changes.

What is the practical use of being able to play paging tricks like this
behind a VMs back?

>> Given the above restriction, I believe a lot of the existing features
>> can continue to work and coexist.  For generating mem_events, the
>> permissions can be altered in the altp2m.  For injecting #VE, the altp2m
>> type can change to the new p2m_ram_rw, so long as the host p2m type is
>> compatible.  For both, a vmexit can occur.  Xen can do the appropriate
>> action and also inject a #VE on its way back into the guest.
>> One thing I have noticed while looking at the #VE stuff that EPT also
>> supports A/D tracking, which might be quite a nice optimisation and
>> forgo the need for p2m_ram_logdirty, but I think this should be treated
>> as an orthogonal item.
> This is far from my area of expertise, but I believe there is code in Xen
> to use EPT D bits in migration.

Not that I can spot, although I seem to remember some talk about it. All
logdirty code still appears to relies on the logdirty bitmap being
filled, which is done from vmexits for p2m_ram_logdirty regions.


> Ed
>> When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
>> as this will not interfere with the IOMMU permissions.
>> Furthermore, I can't conceptually think of an issue against the idea of
>> nestedp2m alternatives, following the same rule that the mapped mfns
>> match up.  That should allow all existing nestedvirt infrastructure
>> continue to work.
>> Does the above look sensible, or have I overlooked something?
>> ~Andrew

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.