|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> -----Original Message-----
> From: Julien Grall <julien@xxxxxxx>
> Sent: 26 February 2020 11:33
> To: Durrant, Paul <pdurrant@xxxxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; Volodymyr Babchuk
> <Volodymyr_Babchuk@xxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>;
> George Dunlap <george.dunlap@xxxxxxxxxx>; Ian Jackson
> <ian.jackson@xxxxxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Konrad
> Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
>
> Hi Paul,
>
> On 25/02/2020 09:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page.
>
> AFAICT, a side-effect of this change is the shared_info is now going to
> be part of the migration stream. One issue with this is on the restore
> side, they will be accounted in number of pages allocated to the domain.
>
> I am fairly certain dirty logging on page with PGC_extra set would not
> work properly as well.
True, those two do need fixing before expanding the use of PGC_extra. I guess
this may already be a problem with the current VMX use case.
>
> As the pages will be recreated in the restore side, I would suggest to
> skip them in XEN_DOMCTL_getpageframeinfo3.
>
At some point in future I guess it may be useful to migrate PGC_extra pages,
but they would need to be marked as such in the stream. Skipping sounds like
the right approach for now.
> > It's
> > only purpose is to be mapped by the guest so use a PGC_extra domheap
> page
> > instead. This does entail freeing shared_info during
> > domain_relinquish_resources() rather than domain_destroy() so care is
> > needed to avoid de-referencing a NULL shared_info pointer hence some
> > extra checks of 'is_dying' are needed.
> > ASSERTions are added before apparently vulnerable dereferences in the
> > event channel code. These should not be hit because domain_kill() calls
> > evtchn_destroy() before calling domain_relinquish_resources() but are
> > added to catch any future re-ordering.
>
> IHMO, the ASSERTions in the event channel pending/mask/... helpers will
> not protect against re-ordering. You may never hit them even if there is
> a re-ordering. It would be better to add an ASSERT()/BUG_ON() in
> evtchn_destroy() and possibly a comment in domain_kill() to mention
> ordering.
I'm not sure about that. Why would they not be hit?
>
> > For Arm, the call to free_shared_info() in arch_domain_destroy() is left
> > in place since it called in the error path for arch_domain_create().
> >
> > NOTE: A modification in p2m_alloc_table() is required to avoid a false
> > positive when checking for domain memory.
> > A fix is also needed in dom0_construct_pv() to avoid
> automatically
> > adding PGC_extra pages to the physmap.
>
> I am not entirely sure how this is related to this patch. Was it a
> latent bug? If so, I think it would make sense to split it from this
> patch.
>
It's related because the check relies on the fact that no PGC_extra pages are
created before it is called. This is indeed true until this patch is applied.
> >
> > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
> > ---
> > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Cc: Julien Grall <julien@xxxxxxx>
> > Cc: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
> > Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> > Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > Cc: Wei Liu <wl@xxxxxxx>
> > ---
> > xen/arch/arm/domain.c | 2 ++
> > xen/arch/x86/domain.c | 3 ++-
> > xen/arch/x86/mm/p2m.c | 3 +--
> > xen/arch/x86/pv/dom0_build.c | 4 ++++
> > xen/common/domain.c | 25 +++++++++++++++++++------
> > xen/common/event_2l.c | 4 ++++
> > xen/common/event_fifo.c | 1 +
> > xen/common/time.c | 3 +++
> > 8 files changed, 36 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 2cbcdaac08..3904519256 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
> > BUG();
> > }
> >
> > + free_shared_info(d);
> > +
> > return 0;
> > }
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index eb7b0fc51c..3ad532eccf 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
> > pv_domain_destroy(d);
> > free_perdomain_mappings(d);
> >
> > - free_shared_info(d);
> > cleanup_domain_irq_mapping(d);
> >
> > psr_domain_free(d);
> > @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
> > if ( is_hvm_domain(d) )
> > hvm_domain_relinquish_resources(d);
> >
> > + free_shared_info(d);
> > +
> > return 0;
> > }
> >
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index c5f428d67c..787d97d85e 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)
> >
> > p2m_lock(p2m);
> >
> > - if ( p2m_is_hostp2m(p2m)
> > - && !page_list_empty(&d->page_list) )
> > + if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
> > {
> > P2M_ERROR("dom %d already has memory allocated\n", d-
> >domain_id);
> > p2m_unlock(p2m);
> > diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> > index dc16ef2e79..f8f1bbe2f4 100644
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
> > {
> > mfn = mfn_x(page_to_mfn(page));
> > BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
> > +
> > + if ( page->count_info & PGC_extra )
> > + continue;
>
> I would add a comment explaining why we skip page with PGC_extra set.
>
> > +
> > if ( get_gpfn_from_mfn(mfn) >= count )
> > {
> > BUG_ON(is_pv_32bit_domain(d));
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index ba7a905258..1d42fbcc0f 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
> > {
> > struct domain *d = v->domain;
> >
> > - v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> > + v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> > ? (vcpu_info_t *)&shared_info(d, vcpu_info[v-
> >vcpu_id])
> > - : &dummy_vcpu_info);
> > + : &dummy_vcpu_info;
>
> Without holding domain_lock(), I don't think there is a guarantee that
> is_dying (and therefore the shared_info) is not going to change behind
> your back. So v->vcpu_info may point to garbagge.
>
> Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.
>
True, it does need locking in some way.
> As an aside, it would be good to explain in a comment why we are using
> dummy_vcpu_info when the domain is dying.
>
Simply because it is guaranteed to exist, but I'll add a comment to that effect.
> > v->vcpu_info_mfn = INVALID_MFN;
> > }
> >
> > @@ -1650,24 +1650,37 @@ int continue_hypercall_on_cpu(
> >
> > int alloc_shared_info(struct domain *d, unsigned int memflags)
> > {
> > - if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) ==
> NULL )
> > + struct page_info *pg;
> > +
> > + pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
> > + if ( !pg )
> > return -ENOMEM;
> >
> > - d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
> > + if ( !get_page_and_type(pg, d, PGT_writable_page) )
> > + return -ENODATA;
>
> I think the page will never be freed if this fails.
>
Good spot.
> > +
> > + d->shared_info.mfn = page_to_mfn(pg);
> > + d->shared_info.virt = __map_domain_page_global(pg);
> >
> > clear_page(d->shared_info.virt);
> > - share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d,
> SHARE_rw);
> >
> > return 0;
> > }
> >
> > void free_shared_info(struct domain *d)
> > {
> > + struct page_info *pg;
> > +
> > if ( !d->shared_info.virt )
> > return;
> >
> > - free_xenheap_page(d->shared_info.virt);
> > + unmap_domain_page_global(d->shared_info.virt);
> > d->shared_info.virt = NULL;
> > +
> > + pg = mfn_to_page(d->shared_info.mfn);
> > +
> > + put_page_alloc_ref(pg);
> > + put_page_and_type(pg);
> > }
> >
> > /*
> > diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
> > index e1dbb860f4..a72fe0232b 100644
> > --- a/xen/common/event_2l.c
> > +++ b/xen/common/event_2l.c
> > @@ -27,6 +27,7 @@ static void evtchn_2l_set_pending(struct vcpu *v,
> struct evtchn *evtchn)
> > * others may require explicit memory barriers.
> > */
> >
> > + ASSERT(d->shared_info.virt);
> > if ( guest_test_and_set_bit(d, port, &shared_info(d,
> evtchn_pending)) )
> > return;
> >
> > @@ -54,6 +55,7 @@ static void evtchn_2l_unmask(struct domain *d, struct
> evtchn *evtchn)
> > * These operations must happen in strict order. Based on
> > * evtchn_2l_set_pending() above.
> > */
> > + ASSERT(d->shared_info.virt);
> > if ( guest_test_and_clear_bit(d, port, &shared_info(d,
> evtchn_mask)) &&
> > guest_test_bit(d, port, &shared_info(d, evtchn_pending)) &&
> > !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d),
> > @@ -67,6 +69,7 @@ static bool evtchn_2l_is_pending(const struct domain
> *d, evtchn_port_t port)
> > {
> > unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) *
> BITS_PER_EVTCHN_WORD(d);
> >
> > + ASSERT(d->shared_info.virt);
> > ASSERT(port < max_ports);
> > return (port < max_ports &&
> > guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
> > @@ -76,6 +79,7 @@ static bool evtchn_2l_is_masked(const struct domain
> *d, evtchn_port_t port)
> > {
> > unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) *
> BITS_PER_EVTCHN_WORD(d);
> >
> > + ASSERT(d->shared_info.virt);
> > ASSERT(port < max_ports);
> > return (port >= max_ports ||
> > guest_test_bit(d, port, &shared_info(d, evtchn_mask)));
> > diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> > index 230f440f14..e8c6045d72 100644
> > --- a/xen/common/event_fifo.c
> > +++ b/xen/common/event_fifo.c
> > @@ -497,6 +497,7 @@ static void setup_ports(struct domain *d)
> >
> > evtchn = evtchn_from_port(d, port);
> >
> > + ASSERT(d->shared_info.virt);
> > if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending))
> )
> > evtchn->pending = 1;
> >
> > diff --git a/xen/common/time.c b/xen/common/time.c
> > index 58fa9abc40..938226c7b1 100644
> > --- a/xen/common/time.c
> > +++ b/xen/common/time.c
> > @@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d)
> > uint32_t *wc_version;
> > uint64_t sec;
> >
> > + if ( d->is_dying )
> > + return;
>
> This is another instance where I think the use of d->is_dying is not
> safe. I looked at how other places in Xen dealt with d->is_dying.
>
> We don't seem to have a common way to deal with it:
> 1) It may be checked under domain_lock() -> No issue with that
> 2) It may be checked under d->page_alloc_lock (e.g assign_pages()).
> The assign_pages() case is fine because it will act as a full barrier.
> So if we call happen after relinquish_memory() then we will surely have
> observed d->is_dying. I haven't checked the others.
>
> Some of the instances user neither the 2 locks above. We probably ought
> to investigate them (I will add a TODO in my list).
>
> Regarding the two cases here, domain_lock() might be the solution. If we
> are concern about the contention (it is a spinlock), then we could
> switch the domain_lock() from spinlock to rwlock.
>
I'll see if there is any other suitable lock guaranteed to be taken during
domain_kill() but it may indeed have to be domain_lock().
Paul
> Cheers,
>
> --
> Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |