WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH 15/17] vmx: nest: virtual ept for nested

On Thu, 2010-05-20 at 20:21 +0800, Tim Deegan wrote:
> At 10:41 +0100 on 22 Apr (1271932887), Qing He wrote:
> > +    hvm_copy_from_guest_virt(&eptp, decode.mem, sizeof(eptp), 0);
> > +    type = reg_read(regs, decode.reg2);
> 
> Needs error handling like the other new instructions.

I'll fix the error handling in the next version

> 
> > +    /* TODO: physical invept on other cpus */
> 
> ?
> 

vcpu may migrate among multiple physical cpus
When a guest invept happens, the shadow ept should be invalidated
on every cpu.

An alternative would be to invalidate shadow ept at the time of
vcpu migration, plus invalidate shadow ept at the time EPT_POINTER
is changed (otherwise, we lose track)

The two methods above both seem to have efficiency problem.

> > +    case EXIT_REASON_EPT_VIOLATION:
> > +    {
> > +        unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
> > +        paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS);
> > +#ifdef __i386__
> > +        gpa |= (paddr_t)__vmread(GUEST_PHYSICAL_ADDRESS_HIGH) << 32;
> > +#endif
> > +        if ( vmx_nest_vept(v) )
> > +        {
> > +            if ( !vept_ept_violation(nest->vept, nest->geptp,
> > +                     exit_qualification, gpa) )
> > +                bypass_l0 = 1;
> > +            else
> > +                nest->vmexit_pending = 1;
> 
> Since bypass_l0 is set from vmexit_pending() here it looks like it's
> always going to be set.  Does that mean we never handle a real EPT
> violation at L0?  I would expect there to be three possible outcomes
> here: give the violation to L1, give it to L0, or fix it in the vept and
> discard it.

I didn't intend to implement a full complete solution in a single run,
the first step is based on the assumption that 1) domU's p2m is fixed,
which means no copy on write, memory over-commitment or ballooning down
(PoD should work though); and 2) no VT-D (no mmio_direct). The reason is
that it works for a wide situations without messing the p2m and
emulation.

The virtual ept has somewhat resemblence to shadow memory, and there is
a dedicated path to shadow. I thought p2m system is written with 1 level
virtualization in mind, and admittedly I'm a little afraid of making
major changes to the memory system, so the L0 p2m is effectively skipped
altogether to avoid messing things up.

Anyway, considering the following:
  1. (in interception) if virtual EPT does not have the entry, give to L1
  2. (in interception) if shadow and virtual ept is not sync, try to build
     the shadow entry
  3. (in common code) if failed (i.e. gfn_to_mfn failed), pass to L0 p2m

Does 3 automatically work? (Just put hvm_emulation aside first)

> > + * This virtual EPT implementation is independent to p2m facility
> > + * and has some different characteristics. It works in a similar
> > + * way as shadow page table (guest table and host table composition),
> > + * but is per-vcpu, and of vTLB style
> > + *   - per vCPU so no lock is required
> 
> What happens when dom0 changes domU's p2m table?  Don't you need to
> shoot down existing vEPT tables from a foreign CPU?

As stated above, this is not handled.
However, all the vEPT tables need to be invalidated, either active (as
EPT_POINTER) or inactive anyway, so I hope there is no other caveats.

> 
> > + * One of the limitations so far, is that it doesn't work with
> > + * L0 emulation code, so L1 p2m_mmio_direct on top of L0 p2m_mmio_dm
> > + * is not supported as for now.
> 
> Is this something you intend to fix before we check it in?

I intended to delay this to the time of virtual VT-D. Not only because
L0 p2m is even skipped, but also hvm_emulate needs non-trivial change,
at least:
  1. reload gva_to_gfn to work with 2 level hap
  2. hvm_emulate needs explicit knowledge of 2 level hap, because it has
     to differentiate a guest page fault and a guest hap fault.

> > +    for ( i = 0; i < VEPT_MAX_SLOTS; i++ )
> > +    {
> > +        slot = xmalloc(struct vept_slot);
> > +        if ( slot == NULL )
> > +            break;
> > +
> > +        memset(slot, 0, sizeof(*slot));
> > +
> > +        INIT_LIST_HEAD(&slot->list);
> > +        INIT_PAGE_LIST_HEAD(&slot->page_list);
> > +
> > +        list_add(&slot->list, &vept->free_slots);
> > +    }
> > +
> > +    for ( i = 0; i < VEPT_ALLOCATION_SIZE; i++ )
> 
> Why a fixed 2MB allocation?  What if your nested domains are very large?
> 
> > +    {
> > +        pg = alloc_domheap_page(NULL, 
> > MEMF_node(domain_to_node(v->domain)));
> 
> Shouldn't this be allocated from the paging pool like other EPT memory?

It just creates a pool for every vcpu like the hap paging pool to avoid
locking (vept is not per-domain). Besides, I think the reservation
adjustment of EPT paging pool is not dynamic as well but through
domctls (I'm not that sure on this, just can't find the dynamic part)?

But yes, definitely, dynamic adjustment is needed.

> 
> > +        if ( pg == NULL )
> > +            break;
> 
> Return an error?

It doesn't fill the pool but still has some water that can be worked
with, so silently ignore.

> > +    shadow_ept_next_level(vept, slot, &table, &gfn_remainder, 3,
> > +                          &old_entry, gw.l4e);
> 
> What if shadow_ept_next_level() returns 0 ?
> 
> > +    shadow_ept_next_level(vept, slot, &table, &gfn_remainder, 2,
> > +                          &old_entry, gw.l3e);
> 
> Ditto
> 
> > +    shadow_ept_next_level(vept, slot, &table, &gfn_remainder, 1,
> > +                          &old_entry, (gw.sp == 2) ? gw.l3e : gw.l2e);
> 
> Ditto

Of course, again, error handling.


Also, what do you think on the separation of vept logics in general?
It does some part of memory management work but circumvents the normal
paging system. I think this makes vept simpler (than creating a `shadow hap'),
and avoids introducing nested related bugs in generic paging code,
but am also worried about its adaption with future changes.

Thanks,
Qing

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>