[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

To: Paul Durrant <pdurrant@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Julien Grall <julien@xxxxxxx>
Date: Wed, 26 Feb 2020 11:33:05 +0000
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Delivery-date: Wed, 26 Feb 2020 11:33:34 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Paul,

On 25/02/2020 09:53, Paul Durrant wrote:

There's no particular reason shared_info need use a xenheap page.

AFAICT, a side-effect of this change is the shared_info is now going tobe part of the migration stream. One issue with this is on the restoreside, they will be accounted in number of pages allocated to the domain.

I am fairly certain dirty logging on page with PGC_extra set would notwork properly as well.

As the pages will be recreated in the restore side, I would suggest toskip them in XEN_DOMCTL_getpageframeinfo3.

It's
only purpose is to be mapped by the guest so use a PGC_extra domheap page
instead. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.
ASSERTions are added before apparently vulnerable dereferences in the
event channel code. These should not be hit because domain_kill() calls
evtchn_destroy() before calling domain_relinquish_resources() but are
added to catch any future re-ordering.

IHMO, the ASSERTions in the event channel pending/mask/... helpers willnot protect against re-ordering. You may never hit them even if there isa re-ordering. It would be better to add an ASSERT()/BUG_ON() inevtchn_destroy() and possibly a comment in domain_kill() to mentionordering.

For Arm, the call to free_shared_info() in arch_domain_destroy() is left
in place since it called in the error path for arch_domain_create().

NOTE: A modification in p2m_alloc_table() is required to avoid a false
       positive when checking for domain memory.
       A fix is also needed in dom0_construct_pv() to avoid automatically
       adding PGC_extra pages to the physmap.

I am not entirely sure how this is related to this patch. Was it alatent bug? If so, I think it would make sense to split it from this patch.


Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
---
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien@xxxxxxx>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Cc: Wei Liu <wl@xxxxxxx>
---
  xen/arch/arm/domain.c        |  2 ++
  xen/arch/x86/domain.c        |  3 ++-
  xen/arch/x86/mm/p2m.c        |  3 +--
  xen/arch/x86/pv/dom0_build.c |  4 ++++
  xen/common/domain.c          | 25 +++++++++++++++++++------
  xen/common/event_2l.c        |  4 ++++
  xen/common/event_fifo.c      |  1 +
  xen/common/time.c            |  3 +++
  8 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2cbcdaac08..3904519256 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
          BUG();
      }

+ free_shared_info(d);

+
      return 0;
  }

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c

index eb7b0fc51c..3ad532eccf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
          pv_domain_destroy(d);
      free_perdomain_mappings(d);

- free_shared_info(d);

      cleanup_domain_irq_mapping(d);

psr_domain_free(d);

@@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
      if ( is_hvm_domain(d) )
          hvm_domain_relinquish_resources(d);

+ free_shared_info(d);

+
      return 0;
  }

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c

index c5f428d67c..787d97d85e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)

p2m_lock(p2m);- if ( p2m_is_hostp2m(p2m)

-         && !page_list_empty(&d->page_list) )
+    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
      {
          P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
          p2m_unlock(p2m);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc16ef2e79..f8f1bbe2f4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
      {
          mfn = mfn_x(page_to_mfn(page));
          BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
+
+        if ( page->count_info & PGC_extra )
+            continue;


I would add a comment explaining why we skip page with PGC_extra set.

+
          if ( get_gpfn_from_mfn(mfn) >= count )
          {
              BUG_ON(is_pv_32bit_domain(d));
diff --git a/xen/common/domain.c b/xen/common/domain.c
index ba7a905258..1d42fbcc0f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
  {
      struct domain *d = v->domain;

- v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)

+    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
                      ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
-                    : &dummy_vcpu_info);
+                    : &dummy_vcpu_info;

Without holding domain_lock(), I don't think there is a guarantee thatis_dying (and therefore the shared_info) is not going to change behindyour back. So v->vcpu_info may point to garbagge.


Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.

As an aside, it would be good to explain in a comment why we are usingdummy_vcpu_info when the domain is dying.

      v->vcpu_info_mfn = INVALID_MFN;
  }
@@ -1650,24 +1650,37 @@ int continue_hypercall_on_cpu(int alloc_shared_info(struct domain *d, unsigned int memflags)
  {
-    if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
+    struct page_info *pg;
+
+    pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
+    if ( !pg )
          return -ENOMEM;
- d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
+    if ( !get_page_and_type(pg, d, PGT_writable_page) )
+        return -ENODATA;


I think the page will never be freed if this fails.

+
+    d->shared_info.mfn = page_to_mfn(pg);
+    d->shared_info.virt = __map_domain_page_global(pg);

clear_page(d->shared_info.virt);

-    share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, SHARE_rw);

return 0;

void free_shared_info(struct domain *d)

  {
+    struct page_info *pg;
+
      if ( !d->shared_info.virt )
          return;

- free_xenheap_page(d->shared_info.virt);

+    unmap_domain_page_global(d->shared_info.virt);
      d->shared_info.virt = NULL;
+
+    pg = mfn_to_page(d->shared_info.mfn);
+
+    put_page_alloc_ref(pg);
+    put_page_and_type(pg);
  }

/*

diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
index e1dbb860f4..a72fe0232b 100644
--- a/xen/common/event_2l.c
+++ b/xen/common/event_2l.c
@@ -27,6 +27,7 @@ static void evtchn_2l_set_pending(struct vcpu *v, struct 
evtchn *evtchn)
       * others may require explicit memory barriers.
       */

+ ASSERT(d->shared_info.virt);

      if ( guest_test_and_set_bit(d, port, &shared_info(d, evtchn_pending)) )
          return;

@@ -54,6 +55,7 @@ static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)

       * These operations must happen in strict order. Based on
       * evtchn_2l_set_pending() above.
       */
+    ASSERT(d->shared_info.virt);
      if ( guest_test_and_clear_bit(d, port, &shared_info(d, evtchn_mask)) &&
           guest_test_bit(d, port, &shared_info(d, evtchn_pending)) &&
           !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d),
@@ -67,6 +69,7 @@ static bool evtchn_2l_is_pending(const struct domain *d, 
evtchn_port_t port)
  {
      unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * 
BITS_PER_EVTCHN_WORD(d);

+ ASSERT(d->shared_info.virt);

      ASSERT(port < max_ports);
      return (port < max_ports &&
              guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
@@ -76,6 +79,7 @@ static bool evtchn_2l_is_masked(const struct domain *d, 
evtchn_port_t port)
  {
      unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * 
BITS_PER_EVTCHN_WORD(d);

+ ASSERT(d->shared_info.virt);

      ASSERT(port < max_ports);
      return (port >= max_ports ||
              guest_test_bit(d, port, &shared_info(d, evtchn_mask)));
diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 230f440f14..e8c6045d72 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -497,6 +497,7 @@ static void setup_ports(struct domain *d)

evtchn = evtchn_from_port(d, port);+ ASSERT(d->shared_info.virt);

          if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
              evtchn->pending = 1;

diff --git a/xen/common/time.c b/xen/common/time.c

index 58fa9abc40..938226c7b1 100644
--- a/xen/common/time.c
+++ b/xen/common/time.c
@@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d)
      uint32_t *wc_version;
      uint64_t sec;

+ if ( d->is_dying )

+        return;

This is another instance where I think the use of d->is_dying is notsafe. I looked at how other places in Xen dealt with d->is_dying.


We don't seem to have a common way to deal with it:
   1) It may be checked under domain_lock() -> No issue with that

2) It may be checked under d->page_alloc_lock (e.g assign_pages()).The assign_pages() case is fine because it will act as a full barrier.So if we call happen after relinquish_memory() then we will surely haveobserved d->is_dying. I haven't checked the others.

Some of the instances user neither the 2 locks above. We probably oughtto investigate them (I will add a TODO in my list).

Regarding the two cases here, domain_lock() might be the solution. If weare concern about the contention (it is a spinlock), then we couldswitch the domain_lock() from spinlock to rwlock.


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
  - From: Durrant, Paul
- Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH 0/2] remove one more shared xenheap page: shared_info
  - From: Paul Durrant
- [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
  - From: Paul Durrant

Prev by Date: Re: [Xen-devel] [PATCH] x86/vPMU: don't blindly assume IA32_PERF_CAPABILITIES MSR exists
Next by Date: [Xen-devel] [XEN PATCH v3 08/23] xen/build: use $(clean) shorthand for clean targets
Previous by thread: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
Next by thread: Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.