[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
Date: Mon, 13 Nov 2017 18:34:53 +0800
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, min.he@xxxxxxxxx, yi.z.zhang@xxxxxxxxx
Delivery-date: Mon, 13 Nov 2017 10:59:52 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



On 11/13/2017 5:31 PM, Jan Beulich wrote:

On 10.11.17 at 15:05, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:

On 11/10/2017 5:49 PM, Jan Beulich wrote:

I'm not certain this is important enough a fix to consider for 4.10,
and you seem to think it's good enough if this gets applied only
after the tree would be branched, as you didn't Cc Julien. Please
indicate if you actually simply weren't aware, and you indeed
there's an important aspect to this that I'm overlooking.

Well, at first I have not expected this to be accepted for 4.10. But
since we have
met this issue in practice, when running a graphic application which
consumes
memory intensively in dom0, I think it also makes sense if we can fix it
in xen's
release as early as possible. Do you think this is a reasonable
requirement? :-)

You'd need to provide further details for us to understand the
scenario. It obviously depends on whether you have other
patches to Xen which actually trigger this. If the problem can
be triggered from outside of a vanilla upstream Xen, then yes,
I think I would favor the fixes being included.


Thank, Jan. Let me try to give an explaination of the scenario. :-)

We saw an ASSERT failue in ASSERT((page->count_info & PGC_count_mask) != 0)
in is_iomem_page() <- put_page_from_l1e() <- alloc_l1_table(), when we run a

graphic application(which is a memory eater, but close sourced) in dom0.And this

failure only happens when dom0 is configured with 2 vCPUs.

Our debug showed the concerned page->count_info was already(andunexpectedly)

cleared in free_xenheap_pages(), and the call trace should be like this:

free_xenheap_pages()
    ^
    |
free_xen_pagetable()
    ^
    |
map_pages_to_xen()
    ^
    |
update_xen_mappings()
    ^
    |
get_page_from_l1e()
    ^
    |
mod_l1_entry()
    ^
    |
do_mmu_update()

And we then realized that it happened when dom0 tries to update its pagetable,and when the cache attributes are gonna be changed for referenced pageframe,corresponding mappings for xen VA space will be updated bymap_pages_to_xen()

as well.

However, since routine map_pages_to_xen() has the aforementioned racingproblem,

when MMU_NORMAL_PT_UPDATE is triggered concurrently on different CPUs, it

may mistakenly free a superpage referenced by pl2e. That's why ourASSERT failure

only happens when dom0 has more than one vCPU configured.

As to the code base, we were running XenGT code, which has only a fewnon-upstreamedpatches in Xen - I believe most of them are libxl related ones, and noneof them ismmu related. So I believe this issue could be triggered by a pv guest toa vanilla

upstream xen.

Is above description convincing enough? :-)

Yu

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
Next by Date: Re: [Xen-devel] [PATCH net-next v1] xen-netback: make copy batch size configurable
Previous by thread: Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
Next by thread: Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.