|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3 of 5] Rework locking in the PoD layer
> On Wed, 2012-02-01 at 19:56 +0000, Andres Lagar-Cavilla wrote:
>> @@ -114,15 +114,20 @@ p2m_pod_cache_add(struct p2m_domain *p2m
>> unmap_domain_page(b);
>> }
>>
>> + /* First, take all pages off the domain list */
>> lock_page_alloc(p2m);
>> -
>> - /* First, take all pages off the domain list */
>> for(i=0; i < 1 << order ; i++)
>> {
>> p = page + i;
>> page_list_del(p, &d->page_list);
>> }
>>
>> + /* Ensure that the PoD cache has never been emptied.
>> + * This may cause "zombie domains" since the page will never be
>> freed. */
>> + BUG_ON( d->arch.relmem != RELMEM_not_started );
>> +
>> + unlock_page_alloc(p2m);
>> +
>
> This assert should stay where it is. The idea is to verify
> after-the-fact that the page_list_add_tail()s *have not* raced with
> emptying the PoD cache. Having the assert before cannot guarantee that
> they *will not* race with emptying the PoD cache. Alternately, if we're
> positive that condition can never happen again, we should just remove
> the BUG_ON().
As is, the code in cache_add mutually excludes p2m_pod_empty_cache until
pod_lock is released by the caller. So, the page_list_add_tails()s cannot
race with emptying the PoD cache.
On further look, the BUG_ON can go. relmem moves out of _not_started only
after p2m_pod_empty_cache has finished. So, transitively, we're safe.
Emptying the pod cache does the spin barrier. Question: should it not take
the lock after the spin barrier, to be super safe?
Thanks!
Andres
>
> If I recall correctly, I put this in here because something ended up
> calling cache_add after empty_cache(), potentially with the p2m lock
> already held; that's why I put the BUG_ON() there to begin with.
>
>> @@ -922,6 +929,12 @@ p2m_pod_emergency_sweep(struct p2m_domai
>> limit = (start > POD_SWEEP_LIMIT) ? (start - POD_SWEEP_LIMIT) : 0;
>>
>> /* FIXME: Figure out how to avoid superpages */
>> + /* NOTE: Promote to globally locking the p2m. This will get
>> complicated
>> + * in a fine-grained scenario. Even if we're to lock each gfn
>> + * individually we must be careful about recursion limits and
>> + * POD_SWEEP_STRIDE. This is why we don't enforce deadlock
>> constraints
>> + * between p2m and pod locks */
>> + p2m_lock(p2m);
>> for ( i=p2m->pod.reclaim_single; i > 0 ; i-- )
>> {
>> p2m_access_t a;
>> @@ -940,7 +953,7 @@ p2m_pod_emergency_sweep(struct p2m_domai
>> /* Stop if we're past our limit and we have found *something*.
>> *
>> * NB that this is a zero-sum game; we're increasing our cache
>> size
>> - * by re-increasing our 'debt'. Since we hold the p2m lock,
>> + * by re-increasing our 'debt'. Since we hold the pod lock,
>> * (entry_count - count) must remain the same. */
>> if ( p2m->pod.count > 0 && i < limit )
>> break;
>> @@ -949,6 +962,7 @@ p2m_pod_emergency_sweep(struct p2m_domai
>> if ( j )
>> p2m_pod_zero_check(p2m, gfns, j);
>>
>> + p2m_unlock(p2m);
>> p2m->pod.reclaim_single = i ? i - 1 : i;
>>
>> }
>> @@ -965,8 +979,9 @@ p2m_pod_demand_populate(struct p2m_domai
>> int i;
>>
>> ASSERT(gfn_locked_by_me(p2m, gfn));
>> + pod_lock(p2m);
>>
>> - /* This check is done with the p2m lock held. This will make sure
>> that
>> + /* This check is done with the pod lock held. This will make sure
>> that
>> * even if d->is_dying changes under our feet,
>> p2m_pod_empty_cache()
>> * won't start until we're done. */
>> if ( unlikely(d->is_dying) )
>> @@ -977,6 +992,7 @@ p2m_pod_demand_populate(struct p2m_domai
>> * 1GB region to 2MB chunks for a retry. */
>> if ( order == PAGE_ORDER_1G )
>> {
>> + pod_unlock(p2m);
>> gfn_aligned = (gfn >> order) << order;
>> /* Note that we are supposed to call set_p2m_entry() 512 times
>> to
>> * split 1GB into 512 2MB pages here. But We only do once here
>> because
>> @@ -1000,11 +1016,15 @@ p2m_pod_demand_populate(struct p2m_domai
>>
>> /* If we're low, start a sweep */
>> if ( order == PAGE_ORDER_2M && page_list_empty(&p2m->pod.super)
>> )
>> + /* Note that sweeps scan other ranges in the p2m. In an
>> scenario
>> + * in which p2m locks are order-enforced wrt pod lock and
>> p2m
>> + * locks are fine grained, this will result in deadlock
>> panic */
>> p2m_pod_emergency_sweep_super(p2m);
>>
>> if ( page_list_empty(&p2m->pod.single) &&
>> ( ( order == PAGE_ORDER_4K )
>> || (order == PAGE_ORDER_2M &&
>> page_list_empty(&p2m->pod.super) ) ) )
>> + /* Same comment regarding deadlock applies */
>> p2m_pod_emergency_sweep(p2m);
>> }
>>
>
You know what, these comments are *old*. I really need to cut down on the
FUD factor :) As is, there is no risk of deadlock.
With a (hypothetical as of today) fine-grained p2m locking approach, I
want to leave a "be careful" reminder. And document your trylock
suggestion, which I believe to be accurate.
Thanks for the detailed look!
Andres
> Regarding locking on emergency sweeps: I think it should suffice if we
> could do the equivalent of a spin_trylock() on each gpfn, and just move
> on (not checking that gfn) if it fails. What do you think?
>
> Other than that, I don't see anything wrong with this locking at first
> glance; but it's complicated enough that I don't think I've quite
> grokked it yet. I'll look at it again tomorrow and see if things are
> more clear. :-)
>
> -George
>
>
>> @@ -1012,8 +1032,6 @@ p2m_pod_demand_populate(struct p2m_domai
>> if ( q == p2m_guest && gfn > p2m->pod.max_guest )
>> p2m->pod.max_guest = gfn;
>>
>> - lock_page_alloc(p2m);
>> -
>> if ( p2m->pod.count == 0 )
>> goto out_of_memory;
>>
>> @@ -1026,8 +1044,6 @@ p2m_pod_demand_populate(struct p2m_domai
>>
>> BUG_ON((mfn_x(mfn) & ((1 << order)-1)) != 0);
>>
>> - unlock_page_alloc(p2m);
>> -
>> gfn_aligned = (gfn >> order) << order;
>>
>> set_p2m_entry(p2m, gfn_aligned, mfn, order, p2m_ram_rw,
>> p2m->default_access);
>> @@ -1038,7 +1054,7 @@ p2m_pod_demand_populate(struct p2m_domai
>> paging_mark_dirty(d, mfn_x(mfn) + i);
>> }
>>
>> - p2m->pod.entry_count -= (1 << order); /* Lock: p2m */
>> + p2m->pod.entry_count -= (1 << order);
>> BUG_ON(p2m->pod.entry_count < 0);
>>
>> if ( tb_init_done )
>> @@ -1056,20 +1072,24 @@ p2m_pod_demand_populate(struct p2m_domai
>> __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
>> }
>>
>> + pod_unlock(p2m);
>> return 0;
>> out_of_memory:
>> - unlock_page_alloc(p2m);
>> + pod_unlock(p2m);
>>
>> printk("%s: Out of populate-on-demand memory! tot_pages %" PRIu32 "
>> pod_entries %" PRIi32 "\n",
>> __func__, d->tot_pages, p2m->pod.entry_count);
>> domain_crash(d);
>> out_fail:
>> + pod_unlock(p2m);
>> return -1;
>> remap_and_retry:
>> BUG_ON(order != PAGE_ORDER_2M);
>> - unlock_page_alloc(p2m);
>> + pod_unlock(p2m);
>>
>> /* Remap this 2-meg region in singleton chunks */
>> + /* NOTE: In a p2m fine-grained lock scenario this might
>> + * need promoting the gfn lock from gfn->2M superpage */
>> gfn_aligned = (gfn>>order)<<order;
>> for(i=0; i<(1<<order); i++)
>> set_p2m_entry(p2m, gfn_aligned+i, _mfn(0), PAGE_ORDER_4K,
>> @@ -1137,9 +1157,11 @@ guest_physmap_mark_populate_on_demand(st
>> rc = -EINVAL;
>> else
>> {
>> - p2m->pod.entry_count += 1 << order; /* Lock: p2m */
>> + pod_lock(p2m);
>> + p2m->pod.entry_count += 1 << order;
>> p2m->pod.entry_count -= pod_count;
>> BUG_ON(p2m->pod.entry_count < 0);
>> + pod_unlock(p2m);
>> }
>>
>> gfn_unlock(p2m, gfn, order);
>> diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m-pt.c
>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -954,6 +954,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
>> struct domain *d = p2m->domain;
>>
>> ASSERT(p2m_locked_by_me(p2m));
>> + ASSERT(pod_locked_by_me(p2m));
>>
>> test_linear = ( (d == current->domain)
>> && !pagetable_is_null(current->arch.monitor_table)
>> );
>> diff -r fb9c06df05f2 -r 56ceab0118cb xen/arch/x86/mm/p2m.c
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -72,6 +72,7 @@ boolean_param("hap_2mb", opt_hap_2mb);
>> static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
>> {
>> mm_lock_init(&p2m->lock);
>> + mm_lock_init(&p2m->pod.lock);
>> INIT_LIST_HEAD(&p2m->np2m_list);
>> INIT_PAGE_LIST_HEAD(&p2m->pages);
>> INIT_PAGE_LIST_HEAD(&p2m->pod.super);
>> @@ -587,8 +588,10 @@ guest_physmap_add_entry(struct domain *d
>> rc = -EINVAL;
>> else
>> {
>> - p2m->pod.entry_count -= pod_count; /* Lock: p2m */
>> + pod_lock(p2m);
>> + p2m->pod.entry_count -= pod_count;
>> BUG_ON(p2m->pod.entry_count < 0);
>> + pod_unlock(p2m);
>> }
>> }
>>
>> @@ -1372,6 +1375,7 @@ p2m_flush_table(struct p2m_domain *p2m)
>> /* "Host" p2m tables can have shared entries &c that need a bit
>> more
>> * care when discarding them */
>> ASSERT(p2m_is_nestedp2m(p2m));
>> + /* Nested p2m's do not do pod, hence the asserts (and no pod
>> lock)*/
>> ASSERT(page_list_empty(&p2m->pod.super));
>> ASSERT(page_list_empty(&p2m->pod.single));
>>
>> @@ -1529,6 +1533,7 @@ void audit_p2m(struct domain *d,
>> P2M_PRINTK("p2m audit starts\n");
>>
>> p2m_lock(p2m);
>> + pod_lock(p2m);
>>
>> if (p2m->audit_p2m)
>> pmbad = p2m->audit_p2m(p2m);
>> @@ -1589,6 +1594,7 @@ void audit_p2m(struct domain *d,
>> }
>> spin_unlock(&d->page_alloc_lock);
>>
>> + pod_unlock(p2m);
>> p2m_unlock(p2m);
>>
>> P2M_PRINTK("p2m audit complete\n");
>> diff -r fb9c06df05f2 -r 56ceab0118cb xen/include/asm-x86/p2m.h
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -261,25 +261,12 @@ struct p2m_domain {
>> unsigned long max_mapped_pfn;
>>
>> /* Populate-on-demand variables
>> - * NB on locking. {super,single,count} are
>> - * covered by d->page_alloc_lock, since they're almost always used
>> in
>> - * conjunction with that functionality. {entry_count} is covered
>> by
>> - * the domain p2m lock, since it's almost always used in
>> conjunction
>> - * with changing the p2m tables.
>> - *
>> - * At this point, both locks are held in two places. In both,
>> - * the order is [p2m,page_alloc]:
>> - * + p2m_pod_decrease_reservation() calls p2m_pod_cache_add(),
>> - * which grabs page_alloc
>> - * + p2m_pod_demand_populate() grabs both; the p2m lock to avoid
>> - * double-demand-populating of pages, the page_alloc lock to
>> - * protect moving stuff from the PoD cache to the domain page
>> list.
>> - *
>> - * We enforce this lock ordering through a construct in mm-locks.h.
>> - * This demands, however, that we store the previous lock-ordering
>> - * level in effect before grabbing the page_alloc lock. The unlock
>> - * level is stored in the arch section of the domain struct.
>> - */
>> + * All variables are protected with the pod lock. We cannot rely on
>> + * the p2m lock if it's turned into a fine-grained lock.
>> + * We only use the domain page_alloc lock for additions and
>> + * deletions to the domain's page list. Because we use it nested
>> + * within the PoD lock, we enforce it's ordering (by remembering
>> + * the unlock level in the arch_domain sub struct). */
>> struct {
>> struct page_list_head super, /* List of superpages
>> */
>> single; /* Non-super lists
>> */
>> @@ -288,6 +275,8 @@ struct p2m_domain {
>> unsigned reclaim_super; /* Last gpfn of a scan */
>> unsigned reclaim_single; /* Last gpfn of a scan */
>> unsigned max_guest; /* gpfn of max guest
>> demand-populate */
>> + mm_lock_t lock; /* Locking of private pod
>> structs, *
>> + * not relying on the p2m lock.
>> */
>> } pod;
>> };
>>
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |