[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 15/17] vmx: nest: virtual ept for nested



At 11:24 +0100 on 21 May (1274441043), Qing He wrote:
> vcpu may migrate among multiple physical cpus
> When a guest invept happens, the shadow ept should be invalidated
> on every cpu.
> 
> An alternative would be to invalidate shadow ept at the time of
> vcpu migration, plus invalidate shadow ept at the time EPT_POINTER
> is changed (otherwise, we lose track)

Since your shadows are per-vcpu, doing the invept at migration time
sounds like a better idea than sending IPIs.  It's not _that_
inefficient (we'd expect most of the TLB to be full of other vcpus' EPT
entries when we migrate to a new cpu), but I'm sure some kind of
dirty-mask or tlbflush timestamp trick could make it better if it ecame
a bottleneck.

> The two methods above both seem to have efficiency problem.
> 
> > > +    case EXIT_REASON_EPT_VIOLATION:
> > > +    {
> > > +        unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
> > > +        paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS);
> > > +#ifdef __i386__
> > > +        gpa |= (paddr_t)__vmread(GUEST_PHYSICAL_ADDRESS_HIGH) << 32;
> > > +#endif
> > > +        if ( vmx_nest_vept(v) )
> > > +        {
> > > +            if ( !vept_ept_violation(nest->vept, nest->geptp,
> > > +                     exit_qualification, gpa) )
> > > +                bypass_l0 = 1;
> > > +            else
> > > +                nest->vmexit_pending = 1;
> > 
> > Since bypass_l0 is set from vmexit_pending() here it looks like it's
> > always going to be set.  Does that mean we never handle a real EPT
> > violation at L0?  I would expect there to be three possible outcomes
> > here: give the violation to L1, give it to L0, or fix it in the vept and
> > discard it.
> 
> I didn't intend to implement a full complete solution in a single run,
> the first step is based on the assumption that 1) domU's p2m is fixed,
> which means no copy on write, memory over-commitment or ballooning down
> (PoD should work though); and 2) no VT-D (no mmio_direct). The reason is
> that it works for a wide situations without messing the p2m and
> emulation.

Ah, OK.  I'd prefer if you enforced those assumptions; otherwise I think
we're at risk of breaking the basic memory isolation rules.  In
particular, if anything _does_ change the p2m we need to do something to
make sure that the guest doesn't retain old mappings of pages that now
belong to someone else. 

I think hooking the current EPT code to flush all the vepts on every p2m
change would be safe, if very slow.  Then you can optimize it from
there. :)

> The virtual ept has somewhat resemblence to shadow memory, and there is
> a dedicated path to shadow. I thought p2m system is written with 1 level
> virtualization in mind, and admittedly I'm a little afraid of making
> major changes to the memory system, so the L0 p2m is effectively skipped
> altogether to avoid messing things up.
> 
> Anyway, considering the following:
>   1. (in interception) if virtual EPT does not have the entry, give to L1
>   2. (in interception) if shadow and virtual ept is not sync, try to build
>      the shadow entry
>   3. (in common code) if failed (i.e. gfn_to_mfn failed), pass to L0 p2m
> 
> Does 3 automatically work? (Just put hvm_emulation aside first)

Looking at your patch, I don't think 3 can happen at all.  But if you
change the logic to pass the EPT fault to the l0 handler it should work.

> > > + * This virtual EPT implementation is independent to p2m facility
> > > + * and has some different characteristics. It works in a similar
> > > + * way as shadow page table (guest table and host table composition),
> > > + * but is per-vcpu, and of vTLB style
> > > + *   - per vCPU so no lock is required
> > 
> > What happens when dom0 changes domU's p2m table?  Don't you need to
> > shoot down existing vEPT tables from a foreign CPU?
> 
> As stated above, this is not handled.
> However, all the vEPT tables need to be invalidated, either active (as
> EPT_POINTER) or inactive anyway, so I hope there is no other caveats.

OK.

> > > + * One of the limitations so far, is that it doesn't work with
> > > + * L0 emulation code, so L1 p2m_mmio_direct on top of L0 p2m_mmio_dm
> > > + * is not supported as for now.
> > 
> > Is this something you intend to fix before we check it in?
> 
> I intended to delay this to the time of virtual VT-D. Not only because
> L0 p2m is even skipped, but also hvm_emulate needs non-trivial change,
> at least:
>   1. reload gva_to_gfn to work with 2 level hap
>   2. hvm_emulate needs explicit knowledge of 2 level hap, because it has
>      to differentiate a guest page fault and a guest hap fault.

Urgh.  I can see why you want to leave this out for now. :) 

> > > +    for ( i = 0; i < VEPT_MAX_SLOTS; i++ )
> > > +    {
> > > +        slot = xmalloc(struct vept_slot);
> > > +        if ( slot == NULL )
> > > +            break;
> > > +
> > > +        memset(slot, 0, sizeof(*slot));
> > > +
> > > +        INIT_LIST_HEAD(&slot->list);
> > > +        INIT_PAGE_LIST_HEAD(&slot->page_list);
> > > +
> > > +        list_add(&slot->list, &vept->free_slots);
> > > +    }
> > > +
> > > +    for ( i = 0; i < VEPT_ALLOCATION_SIZE; i++ )
> > 
> > Why a fixed 2MB allocation?  What if your nested domains are very large?
> > 
> > > +    {
> > > +        pg = alloc_domheap_page(NULL, 
> > > MEMF_node(domain_to_node(v->domain)));
> > 
> > Shouldn't this be allocated from the paging pool like other EPT memory?
> 
> It just creates a pool for every vcpu like the hap paging pool to avoid
> locking (vept is not per-domain). Besides, I think the reservation
> adjustment of EPT paging pool is not dynamic as well but through
> domctls (I'm not that sure on this, just can't find the dynamic part)?

I think it's best to take this from the general EPT memory, rather than
adding this extra overhead to every VCPU, even if the amount isn't
dynamic.

> But yes, definitely, dynamic adjustment is needed.
> 
> > 
> > > +        if ( pg == NULL )
> > > +            break;
> > 
> > Return an error?
> 
> It doesn't fill the pool but still has some water that can be worked
> with, so silently ignore.

But it might have no water at all if the first allocation fails!

> Also, what do you think on the separation of vept logics in general?
> It does some part of memory management work but circumvents the normal
> paging system. I think this makes vept simpler (than creating a `shadow hap'),
> and avoids introducing nested related bugs in generic paging code,
> but am also worried about its adaption with future changes.

I'm happy enough for the vept code to be separate (although possibly
some of the EPT-walking code could be made common).  In fact it's quite
helpful in some ways to limit the confusion that having so many address
spaces introduces. :)

The important thing is that the paging interface must still do the right
thing in every situation, without its callers needing to know about
vept.  That will need hooks in the normal EPT code to call into the vept
code.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.