[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access



>>> On 29.04.14 at 06:45, <aravindp@xxxxxxxxx> wrote:
> +void p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access)
> +{
> +
> +    /* Restrict with access permissions */
> +    switch (access)
> +    {
> +        case p2m_access_r:
> +            *flags &= ~(_PAGE_RW);
> +            *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT);
> +            break;
> +        case p2m_access_rx:
> +        case p2m_access_rx2rw:
> +            *flags &= ~(_PAGE_NX_BIT|_PAGE_RW);
> +            *flags |= _PAGE_PRESENT;
> +            break;
> +        case p2m_access_rw:
> +            *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT);
> +            break;
> +        case p2m_access_rwx:
> +        default:
> +            *flags &= ~(_PAGE_NX_BIT);
> +            *flags |= (_PAGE_RW|_PAGE_PRESENT);
> +            break;
> +    }

Tim will have the final say here, but I'd suggest dropping all these
needless parentheses (I see only one case where they're really
needed).

> +
> +    // Allow more restrictive guest flags to be propagated instead of access
> +    // permissions

Coding style (there are more of these, which I'm not going to
individually comment on).

> +    if ( !(gflags & _PAGE_RW) )
> +        *flags &= ~(_PAGE_RW);
> +
> +    if ( gflags & _PAGE_NX_BIT )
> +        *flags |= _PAGE_NX_BIT;

And now you get things even into inconsistent state wrt parentheses.

> +static int
> +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t 
> mfn,
> +                         unsigned int page_order, p2m_type_t p2mt,
> +                         p2m_access_t p2ma)
> +{
> +    struct domain *d = p2m->domain;
> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
> +    uint table_idx;
> +    uint page_idx;
> +    uint8_t *access_table_page;
> +
> +    ASSERT(shadow_mode_mem_access(d) && access_lookup_table != NULL);

Better split into two ASSERT()s, so that if it triggers one would know
which of the two conditions wasn't met?

> +
> +    /* For PV domains we only support rw, rx, rx2rw, rwx access permissions 
> */
> +    if ( unlikely(p2ma != p2m_access_r &&
> +                  p2ma != p2m_access_rw &&
> +                  p2ma != p2m_access_rx &&
> +                  p2ma != p2m_access_rwx &&
> +                  p2ma != p2m_access_rx2rw) )
> +        return -EINVAL;

Perhaps better done as a switch() statement.

> +
> +    if ( page_get_owner(mfn_to_page(mfn)) != d )
> +        return -ENOENT;
> +
> +    gfn = get_gpfn_from_mfn(mfn_x(mfn));

You get "gfn" passed in and blindly overwrite it here? _If_ you need to
do this lookup, shouldn't you instead check it matches the passed in one?

> +
> +    /*
> +     * Values with the MSB set denote MFNs that aren't really part of the
> +     * domain's pseudo-physical memory map (e.g., the shared info frame).
> +     * Nothing to do here.
> +     */
> +    if ( unlikely(!VALID_M2P(gfn)) )
> +        return 0;
> +
> +    if ( gfn > (d->tot_pages - 1) )
> +        return -EINVAL;

Hardly - a PV domain can easily make its address space sparse, and
iirc pv-ops Linux even does so by default to avoid PFN/MFN collisions
on MMIO space. (And as a side note, this would better be "gfn >=
d->tot_pages".)

> +    paging_lock(d);
> +
> +    table_idx = MEM_ACCESS_TABLE_IDX(gfn);
> +    page_idx = MEM_ACCESS_PAGE_IDX(gfn);
> +    access_table_page = 
> map_domain_page(mfn_x(access_lookup_table[table_idx]));
> +    access_table_page[page_idx] = p2ma;
> +    unmap_domain_page(access_table_page);
> +
> +    if ( sh_remove_all_mappings(d->vcpu[0], mfn) )

Is there anything guaranteeing d->vcpu and d->vcpu[0] being non-
NULL?

> +static mfn_t
> +p2m_mem_access_get_entry(struct p2m_domain *p2m, unsigned long gfn,
> +                         p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
> +                         unsigned int *page_order)
> +{
> +    struct domain *d = p2m->domain;
> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
> +    uint table_idx;
> +    uint page_idx;
> +    uint8_t *access_table_page;
> +    mfn_t mfn = _mfn(gfn); // For PV guests mfn == gfn
> +
> +    ASSERT(shadow_mode_mem_access(d) && access_lookup_table != NULL);
> +
> +    /* Not necessarily true, but for non-translated guests, we claim
> +     * it's the most generic kind of memory */

I think you copied this comment verbatim from elsewhere, but I don't
think it's correct as is.

> +    *t = p2m_ram_rw;
> +
> +    if ( page_get_owner(mfn_to_page(mfn)) != d )
> +        return _mfn(INVALID_MFN);
> +
> +    gfn = get_gpfn_from_mfn(mfn_x(mfn));

Same comment as earlier.

> +    if ( gfn > (d->tot_pages - 1) )

Dito.

> +    table_idx = MEM_ACCESS_TABLE_IDX(gfn);
> +    page_idx = MEM_ACCESS_PAGE_IDX(gfn);
> +
> +    access_table_page = 
> map_domain_page(mfn_x(access_lookup_table[table_idx]));
> +
> +    /* This is a hint to take the default permissions */
> +    if ( access_table_page[page_idx] == p2m_access_n )
> +        access_table_page[page_idx] = p2m->default_access;

We're in "get" here - why does that modify any global state?

> +void p2m_mem_access_teardown(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
> +    uint32_t nr_access_table_pages;
> +    uint32_t ctr;
> +
> +    /* Reset the set_entry and get_entry function pointers */
> +    p2m_pt_init(p2m);
> +
> +    if ( !access_lookup_table  )
> +        return;
> +
> +    nr_access_table_pages = get_domain_nr_access_table_pages(d);
> +
> +    for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )

No new effectively unbounded loops please.

> +int p2m_mem_access_init(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +    mfn_t *access_lookup_table;
> +    uint32_t nr_access_table_pages;
> +    uint32_t ctr;
> +
> +    nr_access_table_pages = get_domain_nr_access_table_pages(d);
> +    access_lookup_table = xzalloc_array(mfn_t, nr_access_table_pages);

This surely is going to be an order > 0 allocation, and you even risk
it being an order > MAX_ORDER one. The former is disallowed at
runtime by convention/agreement, the latter is an outright bug.
(And again just as a side note - you don't appear to need the zero
initialization here.)

> +    if ( !access_lookup_table )
> +        return -ENOMEM;
> +
> +    p2m->access_lookup_table = access_lookup_table;
> +
> +    for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )

Same comment regarding the effective unboundedness of the loop.

> @@ -1414,6 +1419,8 @@ long p2m_set_mem_access(struct domain *d, unsigned long 
> pfn, uint32_t nr,
>      if ( pfn == ~0ul )
>      {
>          p2m->default_access = a;
> +        if ( is_pv_domain(d) )
> +            return p2m_mem_access_set_default(p2m);

Is it really correct to set p2m->default_access _before_ calling that
function? Perhaps it wouldn't be correct doing it the other way
around either - I suppose you need to hold the necessary lock across
both operations.

> @@ -585,9 +585,17 @@ int paging_domctl(struct domain *d, 
> xen_domctl_shadow_op_t *sc,
>      {
>  
>      case XEN_DOMCTL_SHADOW_OP_ENABLE:
> +        /*
> +         * Shadow mem_access mode should only be enabled when mem_access is
> +         * enabled in XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE.
> +         */
> +        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_MEM_ACCESS )
> +            return -EINVAL;
> +
>          if ( !(sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY) )
>              break;
>          /* Else fall through... */
> +
>      case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:

Stray blank line - the fall through here makes it preferable to keep the
two case blocks un-separated.

> @@ -2443,7 +2443,8 @@ int sh_remove_all_mappings(struct vcpu *v, mfn_t gmfn)
>          if ( !(shadow_mode_external(v->domain)
>                 && (page->count_info & PGC_count_mask) <= 3
>                 && ((page->u.inuse.type_info & PGT_count_mask)
> -                   == !!is_xen_heap_page(page))) )
> +                       == !!is_xen_heap_page(page)))
> +                    && !(shadow_mode_mem_access(v->domain)) )

You're breaking indentation, there are pointless parentheses again,
but most importantly - why?

> @@ -625,6 +627,20 @@ _sh_propagate(struct vcpu *v,
>              }
>      }
>  
> +    /* Propagate access permissions */
> +    if ( unlikely((level == 1) &&
> +                  mem_event_check_ring(&d->mem_event->access) &&
> +                  !sh_mfn_is_a_page_table(target_mfn)) )

Just a general remark here - unlikely() used like this is pointless, it
ought to be used on the individual constituents of && or ||.

> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +        p2m_access_t a;
> +        p2m_type_t t;
> +        mfn_t mfn;
> +        mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, NULL);

Missing blank line between declarations and statements. Also this
statement could instead be an initializer.

> @@ -2822,6 +2838,8 @@ static int sh_page_fault(struct vcpu *v,
>      int r;
>      fetch_type_t ft = 0;
>      p2m_type_t p2mt;
> +    p2m_access_t p2ma;

This variable ...

> @@ -3009,7 +3027,80 @@ static int sh_page_fault(struct vcpu *v,
>  
>      /* What mfn is the guest trying to access? */
>      gfn = guest_l1e_get_gfn(gw.l1e);
> -    gmfn = get_gfn(d, gfn, &p2mt);
> +    if ( likely(!mem_event_check_ring(&d->mem_event->access)) )
> +        gmfn = get_gfn(d, gfn, &p2mt);
> +    /*
> +     * A mem_access listener is present, so we will first check if a 
> violation
> +     * has occurred.
> +     */
> +    else
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);

... seems to be used only in this scope, and with you already adding
scope restricted variables it should go here.

> +
> +        gmfn = get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, 0, NULL);
> +        if ( mfn_valid(gmfn) && !sh_mfn_is_a_page_table(gmfn)
> +             && (regs->error_code & PFEC_page_present)
> +             && !(regs->error_code & PFEC_reserved_bit) )
> +        {
> +            int violation = 0;
> +            bool_t access_w = !!(regs->error_code & PFEC_write_access);
> +            bool_t access_x = !!(regs->error_code & PFEC_insn_fetch);
> +            bool_t access_r = access_x ? 0 : !(access_w);

Stray parentheses again. I'm not going to repeat this again - please
just check your code before submitting.

> +            /*
> +             * Do not police writes to guest memory emanating from the Xen
> +             * kernel. Trying to do so will cause the same pagefault to occur
> +             * over and over again with an event being sent to the access
> +             * listener for each fault. If the access listener's vcpu is not
> +             * scheduled during this time, the violation is never resolved 
> and
> +             * will eventually end with the host crashing.
> +             */
> +            if ( (violation && access_w) &&
> +                 (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) )
> +            {
> +                violation = 0;
> +                rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K,
> +                                    p2m_ram_rw, p2m_access_rw);
> +            }

This looks more like a hack than a proper solution - shouldn't the
listener be allowed to know of hypervisor side accesses?

> +/* Number of access table pages for a PV domain */
> +#define get_domain_nr_access_table_pages(d) \
> +        DIV_ROUND_UP(P2M_ACCESS_SIZE * (d->tot_pages - 1), PAGE_SIZE)

And once again. I wonder whether this code was tested on a suitably
big pv-ops PV guest.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.