I have a proposal. I'd like to hear from the list what they think.
- 1. change p2m lock to a read/write lock
- 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
callers of p2m_lock will become write lockers.
- 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
while holding the read lock.
- 4. Have all lookup callers put_page on the obtained mfn, once done.
Rationale: rwlock will prevent races between lookups and async p2m
modifications by paging, sharing, or feature X. The lookup routine
will be protected from races and able to atomically get_page on the
obtained mfn. The lookup caller will be able to work on this mfn
knowing it won't disappear underneath (as in the case currently
brought forward by Zhen)
I'm somewhat wary of having all callers required to put_page, but I
don't think it's a big deal because it's perfectly reasonable.
I'm more wary that turning p2m locking into read/write will result in
code deadlocking itself: taking a read lock first and a write lock
later. Possibly the current rwlock implementation could be improved to
keep a cpumask of read-lockers, and provide an atomic "promote from
read to write" atomic operation (something along the lines of wait
until you're the only reader in the cpumask, and then cmpxchg(lock,
-1, WRITE_BIAS))
Hope that made sense. Thoughts?
Andres
> Date: Mon, 10 Oct 2011 00:40:32 +0800
> From: zhen shi <bickys1986@xxxxxxxxx>
> Subject: Re: [Xen-devel] Re: mapping problems in xenpaging
> To: Tim Deegan <tim@xxxxxxx>, Olaf Hering <olaf@xxxxxxxxx>, Adin
> Scannell <adin@xxxxxxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Message-ID:
> <CACavRyA+Djzr3AVwgaZQu1-doPiMkAZ-NpdVR1nXjiiW_74PqQ@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="iso-8859-1"
>
> 2011/10/6 Tim Deegan <tim@xxxxxxx>
>
>> At 16:56 +0200 on 03 Oct (1317660976), Olaf Hering wrote:
>> > On Fri, Sep 30, Adin Scannell wrote:
>> >
>> > > >> When we analyze and test xenpaging,we found there are some
>> problems between
>> > > >> mapping and xenpaging.
>> > > >> 1) When mapping firstly, then do xenpaging,and the code paths have
>> resolved
>> > > >> the problems.It's OK.
>> > > >> 2) The problems exists if we do address mapping firstly then go to
>> > > >> xenpaging,and our confusions are as followings:
>> > > >> a) If the domU's memory is directly mapped to dom0,such as the
>> hypercall
>> > > >> from pv driver,then it will build a related page-table in dom0,which
>> will not
>> > > >> change p2m-type.
>> > > >> and then do the xenpaging to page out the domU's memory pages
>> whose gfn
>> > > >> address have been already mapped to dom0;So it will cause some
>> problems when
>> > > >> dom0
>> > > >> accesses these pages.Because these pages are paged-out,and
>> dom0 cannot
>> > > >> tell the p2mt before access the pages.
>> > > >
>> > > > I'm not entirely sure what you do. xenpaging runs in dom0 and is able
>> to
>> > > > map paged-out pages. It uses that to trigger a page-in, see
>> > > > tools/xenpaging/pagein.c in xen-unstable.hg
>> > >
>> > > Here's my take...
>> > >
>> > > Xenpaging doesn't allow selection of pages that have been mapped by
>> > > other domains (as in p2m.c):
>> > >
>> > > 669 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn)
>> > > ....
>> > > 693 /* Check page count and type */
>> > > 694 page = mfn_to_page(mfn);
>> > > 695 if ( (page->count_info & (PGC_count_mask | PGC_allocated)) !=
>> > > 696 (1 | PGC_allocated) )
>> > > 697 goto out;
>>
>> I wonder if pages have been mapped by other domains,then the
> page->count_info will be added.I have analyzed xc_map_foreign_pages()
> function,and have not figured out how to add the page->count_info
> by xc_map_foreign_pages().and the page->count_info decreases in munmap().
>
>
>> > > *However*, I think that the problem Zhen is describing still exists:
>> > > 1) xenpaging nominates a page, it is successful.
>> > > 2) dom0 maps the same page (a process other than xenpaging, which will
>> > > also map it).
>> > > 3) xenpaging evicts the page, successfully.
>> > >
>> > > I've experienced a few nasty crashes, and I think this could account
>> > > for a couple (but certainly not all)... I think that the solution may
>> > > be to repeat the refcount check in paging_evict, and roll back the
>> > > nomination gracefully if the race is detected. Thoughts?
>>
>
>
>> > Are there really code paths that touch a mfn without going through the
>> > p2m functions? If so I will copy the check and update xenpaging.
>>
>> >No, but there are race conditions where CPU A could to the p2m lookup,
>> >then CPU B nominates the page and changes its p2m entry, then CPU A
>> >completes the mapping. In the extreme case, detecting this in the
>> >eviction code is also subject to the same race; some sort of atomic
>> >lookup-and-get-reference operation seems like a better fix.
>>
>
> Tim , Olaf and Adin, do you have any good ideas about the above
> situation.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.xensource.com/archives/html/xen-devel/attachments/20111010/55486330/attachment.html
>
> ------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>
> End of Xen-devel Digest, Vol 80, Issue 104
> ******************************************
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|