[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access



>> +void p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access)
>> +{
>> +
>> +    /* Restrict with access permissions */
>> +    switch (access)
>> +    {
>> +        case p2m_access_r:
>> +            *flags &= ~(_PAGE_RW);
>> +            *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT);
>> +            break;
>> +        case p2m_access_rx:
>> +        case p2m_access_rx2rw:
>> +            *flags &= ~(_PAGE_NX_BIT|_PAGE_RW);
>> +            *flags |= _PAGE_PRESENT;
>> +            break;
>> +        case p2m_access_rw:
>> +            *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT);
>> +            break;
>> +        case p2m_access_rwx:
>> +        default:
>> +            *flags &= ~(_PAGE_NX_BIT);
>> +            *flags |= (_PAGE_RW|_PAGE_PRESENT);
>> +            break;
>> +    }
>
>Tim will have the final say here, but I'd suggest dropping all these
>needless parentheses (I see only one case where they're really
>needed).

OK, I will drop all the needless ones.  Could you please point out the case 
they are needed? I couldn't find anything in the CODING_STYLE that dictates 
what should be done.

>> +
>> +    // Allow more restrictive guest flags to be propagated instead of access
>> +    // permissions
>
>Coding style (there are more of these, which I'm not going to
>individually comment on).

Sorry, I let the C++ style comments slip in. I will fix all of them to C style 
comments. 

>> +static int
>> +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long
>gfn, mfn_t mfn,
>> +                         unsigned int page_order, p2m_type_t p2mt,
>> +                         p2m_access_t p2ma)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
>> +    uint table_idx;
>> +    uint page_idx;
>> +    uint8_t *access_table_page;
>> +
>> +    ASSERT(shadow_mode_mem_access(d) && access_lookup_table !=
>NULL);
>
>Better split into two ASSERT()s, so that if it triggers one would know
>which of the two conditions wasn't met?

You are right. I should have done that in the first place. 

>> +
>> +    /* For PV domains we only support rw, rx, rx2rw, rwx access permissions
>*/
>> +    if ( unlikely(p2ma != p2m_access_r &&
>> +                  p2ma != p2m_access_rw &&
>> +                  p2ma != p2m_access_rx &&
>> +                  p2ma != p2m_access_rwx &&
>> +                  p2ma != p2m_access_rx2rw) )
>> +        return -EINVAL;
>
>Perhaps better done as a switch() statement.

Sure. That will definitely be cleaner.

>> +
>> +    if ( page_get_owner(mfn_to_page(mfn)) != d )
>> +        return -ENOENT;
>> +
>> +    gfn = get_gpfn_from_mfn(mfn_x(mfn));
>
>You get "gfn" passed in and blindly overwrite it here? _If_ you need to
>do this lookup, shouldn't you instead check it matches the passed in one?

The "gfn" and "mfn" passed in to the function should be the same as it is a PV 
guest. That is why I am overwriting "gfn" to get the "gpfn" from the 
machine_to_phys_mapping.

>> +
>> +    /*
>> +     * Values with the MSB set denote MFNs that aren't really part of the
>> +     * domain's pseudo-physical memory map (e.g., the shared info frame).
>> +     * Nothing to do here.
>> +     */
>> +    if ( unlikely(!VALID_M2P(gfn)) )
>> +        return 0;
>> +
>> +    if ( gfn > (d->tot_pages - 1) )
>> +        return -EINVAL;
>
>Hardly - a PV domain can easily make its address space sparse, and
>iirc pv-ops Linux even does so by default to avoid PFN/MFN collisions
>on MMIO space. (And as a side note, this would better be "gfn >=
>d->tot_pages".)

I was using this as a guard against going out of bounds in the access lookup 
table which is created based on the domains tot_pages. Please let me know what 
I should be using instead of tot_pages when doing this.
 
>> +    paging_lock(d);
>> +
>> +    table_idx = MEM_ACCESS_TABLE_IDX(gfn);
>> +    page_idx = MEM_ACCESS_PAGE_IDX(gfn);
>> +    access_table_page =
>map_domain_page(mfn_x(access_lookup_table[table_idx]));
>> +    access_table_page[page_idx] = p2ma;
>> +    unmap_domain_page(access_table_page);
>> +
>> +    if ( sh_remove_all_mappings(d->vcpu[0], mfn) )
>
>Is there anything guaranteeing d->vcpu and d->vcpu[0] being non-
>NULL?

I am not sure. I see shadow_track_dirty_vram() calling this without a check. 
Maybe I should check and call it only if d->vcpu is not null.

>> +static mfn_t
>> +p2m_mem_access_get_entry(struct p2m_domain *p2m, unsigned long
>gfn,
>> +                         p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
>> +                         unsigned int *page_order)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
>> +    uint table_idx;
>> +    uint page_idx;
>> +    uint8_t *access_table_page;
>> +    mfn_t mfn = _mfn(gfn); // For PV guests mfn == gfn
>> +
>> +    ASSERT(shadow_mode_mem_access(d) && access_lookup_table !=
>NULL);
>> +
>> +    /* Not necessarily true, but for non-translated guests, we claim
>> +     * it's the most generic kind of memory */
>
>I think you copied this comment verbatim from elsewhere, but I don't
>think it's correct as is.

Yes, indeed. I copied this from __get_gfn_type_access(). I will get rid of the 
comment.

>> +    *t = p2m_ram_rw;
>> +
>> +    if ( page_get_owner(mfn_to_page(mfn)) != d )
>> +        return _mfn(INVALID_MFN);
>> +
>> +    gfn = get_gpfn_from_mfn(mfn_x(mfn));
>
>Same comment as earlier.
>
>> +    if ( gfn > (d->tot_pages - 1) )
>
>Dito.

I have the same reasoning here as I did above.

>> +    table_idx = MEM_ACCESS_TABLE_IDX(gfn);
>> +    page_idx = MEM_ACCESS_PAGE_IDX(gfn);
>> +
>> +    access_table_page =
>map_domain_page(mfn_x(access_lookup_table[table_idx]));
>> +
>> +    /* This is a hint to take the default permissions */
>> +    if ( access_table_page[page_idx] == p2m_access_n )
>> +        access_table_page[page_idx] = p2m->default_access;
>
>We're in "get" here - why does that modify any global state?

I do not initialize the access lookup table to the default value. But the table 
is zeroed as part of initialization when each page is allocated using 
shadow_alloc_p2m_page(). I then use the "p2m_access_n" / 0 value as hint that 
the default value should be returned as p2m_access_n is not valid for PV 
domains. If you prefer that I initialize the table to the default value, I will 
gladly do so.  

>> +void p2m_mem_access_teardown(struct p2m_domain *p2m)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    mfn_t *access_lookup_table = p2m->access_lookup_table;
>> +    uint32_t nr_access_table_pages;
>> +    uint32_t ctr;
>> +
>> +    /* Reset the set_entry and get_entry function pointers */
>> +    p2m_pt_init(p2m);
>> +
>> +    if ( !access_lookup_table  )
>> +        return;
>> +
>> +    nr_access_table_pages = get_domain_nr_access_table_pages(d);
>> +
>> +    for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )
>
>No new effectively unbounded loops please.

OK, I will add a check for preemption and add hypercall continuation logic 
here. 

>> +int p2m_mem_access_init(struct p2m_domain *p2m)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    mfn_t *access_lookup_table;
>> +    uint32_t nr_access_table_pages;
>> +    uint32_t ctr;
>> +
>> +    nr_access_table_pages = get_domain_nr_access_table_pages(d);
>> +    access_lookup_table = xzalloc_array(mfn_t, nr_access_table_pages);
>
>This surely is going to be an order > 0 allocation, and you even risk
>it being an order > MAX_ORDER one. The former is disallowed at
>runtime by convention/agreement, the latter is an outright bug.

Sorry, I was not aware of the convention. What should I be doing here instead?

>(And again just as a side note - you don't appear to need the zero
>initialization here.)
>
>> +    if ( !access_lookup_table )
>> +        return -ENOMEM;
>> +
>> +    p2m->access_lookup_table = access_lookup_table;
>> +
>> +    for ( ctr = 0; ctr < nr_access_table_pages; ctr++ )
>
>Same comment regarding the effective unboundedness of the loop.

OK, I will add a check for preemption and add hypercall continuation logic here 
too.

>> @@ -1414,6 +1419,8 @@ long p2m_set_mem_access(struct domain *d,
>unsigned long pfn, uint32_t nr,
>>      if ( pfn == ~0ul )
>>      {
>>          p2m->default_access = a;
>> +        if ( is_pv_domain(d) )
>> +            return p2m_mem_access_set_default(p2m);
>
>Is it really correct to set p2m->default_access _before_ calling that
>function? Perhaps it wouldn't be correct doing it the other way
>around either - I suppose you need to hold the necessary lock across
>both operations.

I do not see the problem here. p2m_mem_access_set_default() just blows the 
shadows away after taking the paging_lock. There is no actual setting of 
default permission for all pages in the PV case.

>> @@ -585,9 +585,17 @@ int paging_domctl(struct domain *d,
>xen_domctl_shadow_op_t *sc,
>>      {
>>
>>      case XEN_DOMCTL_SHADOW_OP_ENABLE:
>> +        /*
>> +         * Shadow mem_access mode should only be enabled when
>mem_access is
>> +         * enabled in XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE.
>> +         */
>> +        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_MEM_ACCESS )
>> +            return -EINVAL;
>> +
>>          if ( !(sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY) )
>>              break;
>>          /* Else fall through... */
>> +
>>      case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
>
>Stray blank line - the fall through here makes it preferable to keep the
>two case blocks un-separated.

OK, I will keep them un-seperated.

>> @@ -2443,7 +2443,8 @@ int sh_remove_all_mappings(struct vcpu *v,
>mfn_t gmfn)
>>          if ( !(shadow_mode_external(v->domain)
>>                 && (page->count_info & PGC_count_mask) <= 3
>>                 && ((page->u.inuse.type_info & PGT_count_mask)
>> -                   == !!is_xen_heap_page(page))) )
>> +                       == !!is_xen_heap_page(page)))
>> +                    && !(shadow_mode_mem_access(v->domain)) )
>
>You're breaking indentation, there are pointless parentheses again,
>but most importantly - why?

Sorry, I meant to ask a question about this in my patch message and mark this 
as a workaround. I am seeing the message on all sh_remove_all_mappings() and I 
was not able to figure out why this was happening. I just added this as a work 
around. I was hoping you or Tim would shed more light on this.

>> @@ -625,6 +627,20 @@ _sh_propagate(struct vcpu *v,
>>              }
>>      }
>>
>> +    /* Propagate access permissions */
>> +    if ( unlikely((level == 1) &&
>> +                  mem_event_check_ring(&d->mem_event->access) &&
>> +                  !sh_mfn_is_a_page_table(target_mfn)) )
>
>Just a general remark here - unlikely() used like this is pointless, it
>ought to be used on the individual constituents of && or ||.

OK, I will leave this for mem_event_check_ring() case.

>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +        p2m_access_t a;
>> +        p2m_type_t t;
>> +        mfn_t mfn;
>> +        mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, NULL);
>
>Missing blank line between declarations and statements. Also this
>statement could instead be an initializer.

I will make it an initializer.

>> @@ -2822,6 +2838,8 @@ static int sh_page_fault(struct vcpu *v,
>>      int r;
>>      fetch_type_t ft = 0;
>>      p2m_type_t p2mt;
>> +    p2m_access_t p2ma;
>
>This variable ...
>
>> @@ -3009,7 +3027,80 @@ static int sh_page_fault(struct vcpu *v,
>>
>>      /* What mfn is the guest trying to access? */
>>      gfn = guest_l1e_get_gfn(gw.l1e);
>> -    gmfn = get_gfn(d, gfn, &p2mt);
>> +    if ( likely(!mem_event_check_ring(&d->mem_event->access)) )
>> +        gmfn = get_gfn(d, gfn, &p2mt);
>> +    /*
>> +     * A mem_access listener is present, so we will first check if a 
>> violation
>> +     * has occurred.
>> +     */
>> +    else
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
>
>... seems to be used only in this scope, and with you already adding
>scope restricted variables it should go here.

OK, I will do that.

>> +
>> +        gmfn = get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, 0,
>NULL);
>> +        if ( mfn_valid(gmfn) && !sh_mfn_is_a_page_table(gmfn)
>> +             && (regs->error_code & PFEC_page_present)
>> +             && !(regs->error_code & PFEC_reserved_bit) )
>> +        {
>> +            int violation = 0;
>> +            bool_t access_w = !!(regs->error_code & PFEC_write_access);
>> +            bool_t access_x = !!(regs->error_code & PFEC_insn_fetch);
>> +            bool_t access_r = access_x ? 0 : !(access_w);
>
>Stray parentheses again. I'm not going to repeat this again - please
>just check your code before submitting.

I am really sorry. I will fix this is in the next patch series.

>> +            /*
>> +             * Do not police writes to guest memory emanating from the Xen
>> +             * kernel. Trying to do so will cause the same pagefault to 
>> occur
>> +             * over and over again with an event being sent to the access
>> +             * listener for each fault. If the access listener's vcpu is not
>> +             * scheduled during this time, the violation is never resolved 
>> and
>> +             * will eventually end with the host crashing.
>> +             */
>> +            if ( (violation && access_w) &&
>> +                 (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) 
>> )
>> +            {
>> +                violation = 0;
>> +                rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K,
>> +                                    p2m_ram_rw, p2m_access_rw);
>> +            }
>
>This looks more like a hack than a proper solution - shouldn't the
>listener be allowed to know of hypervisor side accesses?

Ideally the listener should know of the hypervisor side accesses but I ran in 
to cases where the listener would not get to run and the pagefault would be 
tried over and over, eventually causing the host to crash. I guess more detail 
about the problem is needed. I will send out a separate email regarding this, 
CCing you and Tim.

>> +/* Number of access table pages for a PV domain */
>> +#define get_domain_nr_access_table_pages(d) \
>> +        DIV_ROUND_UP(P2M_ACCESS_SIZE * (d->tot_pages - 1), PAGE_SIZE)
>
>And once again. I wonder whether this code was tested on a suitably
>big pv-ops PV guest.

The largest PV domain I tried this with was 28GB.

Thanks,
Aravindh


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.