[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

On 26/02/2020 12:03, Durrant, Paul wrote:
-----Original Message-----
From: Julien Grall <julien@xxxxxxx>
Sent: 26 February 2020 11:33
To: Durrant, Paul <pdurrant@xxxxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; Volodymyr Babchuk
<Volodymyr_Babchuk@xxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>;
George Dunlap <george.dunlap@xxxxxxxxxx>; Ian Jackson
<ian.jackson@xxxxxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Konrad
Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>
Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for

Hi Paul,

On 25/02/2020 09:53, Paul Durrant wrote:
There's no particular reason shared_info need use a xenheap page.

AFAICT, a side-effect of this change is the shared_info is now going to
be part of the migration stream. One issue with this is on the restore
side, they will be accounted in number of pages allocated to the domain.

I am fairly certain dirty logging on page with PGC_extra set would not
work properly as well.

True, those two do need fixing before expanding the use of PGC_extra. I guess 
this may already be a problem with the current VMX use case.

As the pages will be recreated in the restore side, I would suggest to
skip them in XEN_DOMCTL_getpageframeinfo3.

At some point in future I guess it may be useful to migrate PGC_extra pages, 
but they would need to be marked as such in the stream. Skipping sounds like 
the right approach for now.

only purpose is to be mapped by the guest so use a PGC_extra domheap
instead. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.
ASSERTions are added before apparently vulnerable dereferences in the
event channel code. These should not be hit because domain_kill() calls
evtchn_destroy() before calling domain_relinquish_resources() but are
added to catch any future re-ordering.

IHMO, the ASSERTions in the event channel pending/mask/... helpers will
not protect against re-ordering. You may never hit them even if there is
a re-ordering. It would be better to add an ASSERT()/BUG_ON() in
evtchn_destroy() and possibly a comment in domain_kill() to mention

I'm not sure about that. Why would they not be hit?

A developper may never touch the event channels after during the domain destruction in debug build. So the re-ordering would become unnoticed.

This would become quite a problem if the production build end up to touch event channels during the domain destruction.

For Arm, the call to free_shared_info() in arch_domain_destroy() is left
in place since it called in the error path for arch_domain_create().

NOTE: A modification in p2m_alloc_table() is required to avoid a false
        positive when checking for domain memory.
        A fix is also needed in dom0_construct_pv() to avoid
        adding PGC_extra pages to the physmap.

I am not entirely sure how this is related to this patch. Was it a
latent bug? If so, I think it would make sense to split it from this

It's related because the check relies on the fact that no PGC_extra pages are 
created before it is called. This is indeed true until this patch is applied.

I would prefer if they are fixed in separate patches as this pach already quite a lot. But I will leave that up to Andrew and Jan.

Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien@xxxxxxx>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Cc: Wei Liu <wl@xxxxxxx>
   xen/arch/arm/domain.c        |  2 ++
   xen/arch/x86/domain.c        |  3 ++-
   xen/arch/x86/mm/p2m.c        |  3 +--
   xen/arch/x86/pv/dom0_build.c |  4 ++++
   xen/common/domain.c          | 25 +++++++++++++++++++------
   xen/common/event_2l.c        |  4 ++++
   xen/common/event_fifo.c      |  1 +
   xen/common/time.c            |  3 +++
   8 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2cbcdaac08..3904519256 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)

+    free_shared_info(d);
       return 0;

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index eb7b0fc51c..3ad532eccf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)

-    free_shared_info(d);

@@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
       if ( is_hvm_domain(d) )

+    free_shared_info(d);
       return 0;

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c5f428d67c..787d97d85e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)


-    if ( p2m_is_hostp2m(p2m)
-         && !page_list_empty(&d->page_list) )
+    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
           P2M_ERROR("dom %d already has memory allocated\n", d-
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc16ef2e79..f8f1bbe2f4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
           mfn = mfn_x(page_to_mfn(page));
+        if ( page->count_info & PGC_extra )
+            continue;

I would add a comment explaining why we skip page with PGC_extra set.

           if ( get_gpfn_from_mfn(mfn) >= count )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index ba7a905258..1d42fbcc0f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
       struct domain *d = v->domain;

-    v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
+    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
                       ? (vcpu_info_t *)&shared_info(d, vcpu_info[v-
-                    : &dummy_vcpu_info);
+                    : &dummy_vcpu_info;

Without holding domain_lock(), I don't think there is a guarantee that
is_dying (and therefore the shared_info) is not going to change behind
your back. So v->vcpu_info may point to garbagge.

Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.

True, it does need locking in some way.

As an aside, it would be good to explain in a comment why we are using
dummy_vcpu_info when the domain is dying.

Simply because it is guaranteed to exist, but I'll add a comment to that effect.

Well, the question is what will happen if you end up to access it. Could you inadvertently write from domain A vCPU A and then read from domain B vCPU B?

Such problem is already present technically. But I would be fairly concerned if that's ever possible during domain destroy. So what are we trying to protect against here? Is it just code hardening?



Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.