At 15:01 +0100 on 09 Sep (1284044463), Christoph Egger wrote:
> > This comment is confusing to me. The nested guest gpa isn't a virtual
> > address, and certainly not the same as an L1-guest VA. Can you reword
> > it, maybe just to say what the call to nestedhvm_hap_nested_page_fault()
> > is going to do?
>
> New wording is:
>
> /* The vcpu is in guest mdoe and the l1 guest
> * uses hap. That means 'gpa' is in l2 guest
> * physical adderss space.
> * Fix the nested p2m or inject nested page fault
> * into l1 guest if not fixable. The algorithm is
> * the same as for shadow paging.
> */
>
> Is this fine with you ?
Sure, that's better.
> > > - gfn = paging_gva_to_gfn(curr, addr, &pfec);
> > > + gfn = paging_p2m_ga_to_gfn(curr, p2m, mode, cr3, addr,
> > > &pfec);
> >
> > Do other callers of paging_gva_to_gfn need to handle nested-npt mode as
> > well?
>
> If the other callers can be reached when the vcpu is in guest mode, then yes.
It looks like most of them can.
> > If so, would it be better to update paging_gva_to_gfn() itself?
>
> This is definitely easier but the call to p2m_get_p2m() might become be very
> expensive and should be cached by passing through per function argument.
I'd rather it was cached somewhere in the per-vcpu state if
p2m_get_p2m() is expensive, and not pass any more arguments. Callers of
paging_gva_to_gfn() shouldn't need to know about the internals of the
p2m code - they should just be able to pass a VA and get a (L1) gfn.
> > > + p2m = vcpu_nestedhvm(v).nh_p2m;
> > > + if (p2m == NULL)
> > > + /* p2m has either been invalidated or not yet assigned. */
> > > + return;
> > > +
> > > + cpu_set(v->processor, p2m->p2m_dirty_cpumask);
> >
> > Is this arch-specific code? (and likewise the cpu_clear that follows)
>
> No. I don't see how.
It's only used by NPT-on-NPT. IIRC the EPT-on-EPT code uses a soft-TLB
model so it won't need this cpumask. But it's not a big deal.
> > This function looks like it duplicates a lot of logic from an old
> > version of the normal p2m next-level function.
>
> Yes, it is redundant. The point is that the *_write_p2m_entry() functions
> are not nested virtualization aware. I am having troubles with understanding
> the monitor p2m tables.
>
> > Would it be easy to merge them?
>
> Maybe, maybe not. I can answer the question when I understand the use of the
> monitor table and how they work.
OK. I'll wait for the next rev then.
> > > +int
> > > +nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t L2_gpa)
> > > +{
> > > + int rv;
> > > + paddr_t L1_gpa, L0_gpa;
> > > + struct domain *d = v->domain;
> > > + struct p2m_domain *p2m, *nested_p2m;
> > > +
> > > + p2m = p2m_get_hostp2m(d); /* L0 p2m */
> > > + nested_p2m = p2m_get_nestedp2m(v, vcpu_nestedhvm(v).nh_vm_hostcr3);
> > > +
> > > + /* walk the L1 P2M table, note we have to pass p2m
> > > + * and not nested_p2m here or we fail the walk forever,
> > > + * otherwise. */
> >
> > Can you explain that more fully? It looks like you're walking the L0
> > table to translate an L2 gfn into an L1 gfn, which surely can't be right.
>
> The l1 guest hostcr3 is in l1 guest physical address space. To walk the
> l1 guest's page table I need to translate it into host physical address space
> by walking the l0 p2m table.
>
> > > + rv = nestedhap_walk_L1_p2m(v, p2m, L2_gpa, &L1_gpa);
OK, I understand now, thanks.
> > > +static int
> > > +p2m_flush_locked(struct p2m_domain *p2m)
> > > +{
> > > + struct page_info * (*alloc)(struct p2m_domain *);
> > > + void (*free)(struct p2m_domain *, struct page_info *);
> > > +
> > > + alloc = p2m->alloc_page;
> > > + free = p2m->free_page;
> > > +
> > > + if (p2m->cr3 == 0)
> > > + /* Microoptimisation: p2m is already empty.
> > > + * => about 0.3% speedup of overall system performance.
> > > + */
> > > + return 0;
> >
> > What happens if a malicious guest uses 0 as its actual CR3 value?
>
> When l1 guest uses hap and l2 guest uses shadow paging, this code path
> never runs.
> When l1 guest uses hap and l2 guest uses hap, this is can only be the
> guest's hostcr3.
>
> So your question is what happens if a malicious guest uses 0 to point to
> its nested page table ?
Yes.
> I think, this guest crashes sooner or later anyway.
It might, but in the meantime we've missed a flush. If we fail to
properly flush a nested P2M when we should, the L2 guest gets to use
stale entries to write to memory it shouldn't see.
My point is that you can't use cr3==0 as an indicator that this slot
isn't in use because the guest can cause it to be true. Maybe -1 would
be a better choice, since that's not a valid cr3 value.
> > > + /* All p2m's are or were in use. We know the least recent used one.
> > > + * Destroy and re-initialize it.
> > > + */
> > > + for (i = 0; i < MAX_NESTEDP2M; i++) {
> > > + p2m = p2m_getlru_nestedp2m(d, NULL);
> > > + rv = p2m_flush_locked(p2m);
> >
> > Is this enough? If this p2m might be in host_vmcb->h_cr3 somewhere on
> > another vcpu, then you need to make sure that vcpu gets reset not to use
> > it any more.
>
> There are three possibilities:
> An other vcpu is in VMRUN emulation before a nestedp2m is assigned.
> In this case, it will get a new nestedp2m as it won't find its 'old'
> nestedp2m anymore.
>
> An other vcpu is in VMRUN emulation after a nestedp2m is assigned.
> It will VMEXIT with a nested page fault.
Why?
> An other vcpu already running l2 guest.
> It will VMEXIT with a nested page fault immediately.
Hmm. It will exit for the TLB shootdown IPI, but I think you need to
clear vcpu_nestedhvm(v).nh_p2m on the other vcpu to make sure it doesn't
re-enter with the p2m you've just recycled.
Cheers,
Tim.
--
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|