[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables



On Wed, 2015-12-16 at 15:36 +0000, George Dunlap wrote:
> [Adding Tim, the previous mm maintainer]
> 
> On 11/12/15 09:16, Wu, Feng wrote:
> > > > +{
> > > > +    void *xsave_addr;
> > > > +    unsigned int pkru = 0;
> > > > +    bool_t pkru_ad, pkru_wd;
> > > > +
> > > > +    bool_t uf = !!(pfec & PFEC_user_mode);
> > > > +    bool_t wf = !!(pfec & PFEC_write_access);
> > > > +    bool_t ff = !!(pfec & PFEC_insn_fetch);
> > > > +    bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
> > > > +    bool_t pkuf  = !!(pfec & PFEC_prot_key);
> > > 
> > > So I'm just wondering out loud here -- is there actually any
> > > situation
> > > in which we would want guest_walk_tables to act differently than
> > > the
> > > real hardware?
> > > 
> > > That is, is there actually any situation where, pku is enabled,
> > > the vcpu
> > > is in long mode, PFEC_write_access and/or PFEC_page_present is
> > > set, and
> > > the pkey is non-zero, that we want guest_walk_tables() to only
> > > check the
> > > write-protect bit for the pte, and not also check the pkru?
> > > 
> > > Because if not, it seems like it would be much more robust to
> > > simply
> > > *always* check for pkru_ad if PFEC_page_present is set, and for
> > > pkru_wd
> > > if PFEC_write_access is set.
> > 
> > guest_walk_tables() is also used by shadow code, though we don't
> > plan to support pkeys for shadow now, however, in that case, the
> > 'pfec'
> > is generated by hardware, and the pkuf bit may be 0 or 1 depending
> > on the real page fault. If we unconditionally check pkeys in
> > guest_walk_tables(), it is not a good ideas for someone who may
> > want to implement the pkeys for shadow in future, since we only
> > need to check pkeys when the pkuf is set in pfec.
> 
> So you'll have to forgive me for being a bit slow here -- I stepped
> away
> from the mm code for a while, and a lot has changed since I agreed to
> become mm maintainer earlier this year.
> 
> So here is what I see in the tree; please correct me if I missed
> something important.
> 
> The function guest_walk_tables():
>  1. Accepts a pfec argument which is meant to describe the *access
> type*
> that is happening
>  2. Returns a value with the "wrong" flags (i.e., flags which needed
> to
> be set that were missing, or flags that needed to be clear that were
> set).
>  3. It checks reserved bits in the pagetable regardless of whether
> PFEC_reserved_bit is set in the caller, and returns
> _PAGE_INVALID_BITS
> if so.
>  4. If PFEC_insn_fetch is set, it will only check for nx and smep if
> those features are enabled for the guest.
> 
> The main callers are the various gva_to_gfn(), which accept a
> *pointer*
> to a pfec argument.  This pointer is actually bidirectional: its
> value
> passed to guest_walk_tables() to determine what access types are
> checked; what is returned is meant to be a pfec value which can be
> passed to a guest as part of a fault (and is by several callers). 
>  But
> it may also return the Xen-internal flags, PFEC_page_{paged,shared}.
> And, importantly, it modifies the pfec value based on the bits that
> are
> returned from guest_walk_tables(): It will clear PFEC_present if
> guest_walk_tables() returns _PAGE_PRESENT, and it will set
> PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT.
> 
> The next logical level up are the
> hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec
> argument: but really the only purpose of the pfec argument is to
> allow
> the caller to add the PFEC_user_mode flag; the other access-type
> flags
> are set automatically by the functions themselves (e.g., "to" sets
> PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c).  Several
> callers
> set PFEC_present as well, but callers set any other bits.
> 
> (hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx
> or
> smep are enabled in the guest.  This seems inconsistent to me with
> the
> treatment of PFEC_reserved_bit: it seems like
> hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch,
> particularly as guest_walk_tables() will already gate the checks
> based
> on whether nx or smep is enabled in the guest.  Tim, you know of any
> reason for this?)
> 
> So there seems to me to be no reason to pass PFEC_prot_key into
> guest_walk_tables() or gva_to_gfn().  The pfec value passed *into*
> those
> should simply indicate the type of memory access being done: present,
> write, instruction fetch, user.
> 
> With your current series, guest_walk_tables() already checks for
> pkeys
> being enabled in the guest before checking for them in the
> pagetables.
> For shadow mode, these will be false, and so no checks will be done. 
>  If
> anyone ever implements pkeys for shadow mode, then these will be
> enabled, and the checks will be done, without any intervention on the
> part of the caller.

I have understood it, but, the problem with shadow mode is that pfec
may come from regs->error_codeïhardwareï, just like:
rc = sh_walk_guest_tables(v, va, &gw, regs->error_code);
so, when regs->error_code does not have PFEC_prot_key,
guest_walk_tables may still check PKEY when codes is writen according
to what you said, and it maybe return a different result.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.