Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/hvm: don't rely on shared ioreq state for completion handling

To: "Paul Durrant" <Paul.Durrant@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 11 Aug 2015 10:03:01 -0600

Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "Keir\(Xen.org\)" <keir@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 11 Aug 2015 16:03:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 11.08.15 at 17:49, <Paul.Durrant@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> Sent: 11 August 2015 16:46 >> To: Paul Durrant >> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org) >> Subject: RE: [PATCH] x86/hvm: don't rely on shared ioreq state for >> completion handling >> >> >>> On 11.08.15 at 17:32, <Paul.Durrant@xxxxxxxxxx> wrote: >> >> -----Original Message----- >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> >> Sent: 11 August 2015 16:20 >> >> To: Paul Durrant >> >> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org) >> >> Subject: Re: [PATCH] x86/hvm: don't rely on shared ioreq state for >> >> completion handling >> >> >> >> >>> On 31.07.15 at 17:34, <paul.durrant@xxxxxxxxxx> wrote: >> >> > Both hvm_io_pending() and hvm_wait_for_io() use the shared (with >> >> emulator) >> >> > ioreq structure to determined whether there is a pending I/O. The latter >> >> > will >> >> > misbehave if the shared state is driven to STATE_IOREQ_NONE by the >> >> emulator, >> >> > or when the shared ioreq page is cleared for re-insertion into the guest >> >> > P2M when the ioreq server is disabled (STATE_IOREQ_NONE == 0) >> because >> >> it >> >> > will terminate its wait without calling hvm_io_assist() to adjust Xen's >> >> > internal I/O emulation state. This may then lead to an io completion >> >> > handler finding incorrect internal emulation state and calling >> >> > domain_crash(). >> >> > >> >> > This patch fixes the problem by adding a pending flag to the ioreq >> server's >> >> > per-vcpu structure which cannot be directly manipulated by the >> emulator >> >> > and thus can be used to determine whether an I/O is actually pending >> for >> >> > that vcpu on that ioreq server. If an I/O is pending and the shared >> >> > state >> >> > is seen to go to STATE_IOREQ_NONE then it can be treated as an >> abnormal >> >> > completion of emulation (hence the data placed in the shared structure >> >> > is not used) and the internal state is adjusted as for a normal >> completion. >> >> > Thus, when a completion handler subsequently runs, the internal state >> is as >> >> > expected and domain_crash() will not be called. >> >> > >> >> > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> >> >> > Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> >> >> > Tested-by: Roger Pau MonnÃ <roger.pau@xxxxxxxxxx> >> >> >> >> I realize this went in already, but ... >> >> >> >> > --- a/xen/arch/x86/hvm/hvm.c >> >> > +++ b/xen/arch/x86/hvm/hvm.c >> >> > @@ -412,44 +412,57 @@ bool_t hvm_io_pending(struct vcpu *v) >> >> > &d->arch.hvm_domain.ioreq_server.list, >> >> > list_entry ) >> >> > { >> >> > - ioreq_t *p = get_ioreq(s, v); >> >> > + struct hvm_ioreq_vcpu *sv; >> >> > >> >> > - if ( p->state != STATE_IOREQ_NONE ) >> >> > - return 1; >> >> > + list_for_each_entry ( sv, >> >> > + &s->ioreq_vcpu_list, >> >> > + list_entry ) >> >> > + { >> >> > + if ( sv->vcpu == v && sv->pending ) >> >> > + return 1; >> >> > + } >> >> >> >> ... while from the review of the original series I recall that doing the >> >> outer loop without any lock is fine (due to using domain_pause() >> >> when registering servers) I'm not convinced this extends to the >> >> inner loop. Can you clarify please? (There are a couple more such >> >> loops that I can't immediately see being protected by a lock.) >> > >> > Yes, I think you are right. If a cpu were to disappear then the list walk >> > would be compromised. It should either be locked or rcu in all places. >> >> I don't think we need to be concerned of vCPU-s disappearing, >> since that doesn't happen during the lifetime of a VM. And the >> hvm_do_resume() path is used only for domains still alive. Of >> course, if any of the lockless loops sit on paths reachable after >> a domain got marked dying, that would need fixing. >> >> What I'm more concerned about are list manipulations behind >> the back of a list traversing CPU. Or do those happen only upon >> vCPU creation/destruction? > > Theoretically we could do vcpu hot remove, couldn't we? That's the case I > was thinking of. But that only removes the vCPU from what the guest sees. The hypervisor never de-allocates a vCPU prior to domain death. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.