[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v3 06/12] xen-blkfront: add callbacks for PM suspend and hibernation

On Fri, Feb 21, 2020 at 12:49:18AM +0000, Anchal Agarwal wrote:
> On Thu, Feb 20, 2020 at 10:01:52AM -0700, Durrant, Paul wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > Sent: 20 February 2020 16:49
> > > To: Durrant, Paul <pdurrant@xxxxxxxxxxxx>
> > > Cc: Agarwal, Anchal <anchalag@xxxxxxxxxx>; Valentin, Eduardo
> > > <eduval@xxxxxxxxxx>; len.brown@xxxxxxxxx; peterz@xxxxxxxxxxxxx;
> > > benh@xxxxxxxxxxxxxxxxxxx; x86@xxxxxxxxxx; linux-mm@xxxxxxxxx;
> > > pavel@xxxxxx; hpa@xxxxxxxxx; tglx@xxxxxxxxxxxxx; sstabellini@xxxxxxxxxx;
> > > fllinden@xxxxxxxxxx; Kamata, Munehisa <kamatam@xxxxxxxxxx>;
> > > mingo@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Singh, Balbir
> > > <sblbir@xxxxxxxxxx>; axboe@xxxxxxxxx; konrad.wilk@xxxxxxxxxx;
> > > bp@xxxxxxxxx; boris.ostrovsky@xxxxxxxxxx; jgross@xxxxxxxx;
> > > netdev@xxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx;
> > > linux-kernel@xxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; davem@xxxxxxxxxxxxx;
> > > Woodhouse, David <dwmw@xxxxxxxxxxxx>
> > > Subject: Re: [Xen-devel] [RFC PATCH v3 06/12] xen-blkfront: add callbacks
> > > for PM suspend and hibernation
> > > For example one necessary difference will be that xenbus initiated
> > > suspend won't close the PV connection, in case suspension fails. On PM
> > > suspend you seem to always close the connection beforehand, so you
> > > will always have to re-negotiate on resume even if suspension failed.
> > >
> I don't get what you mean, 'suspension failure' during disconnecting frontend 
> from 
> backend? [as in this case we mark frontend closed and then wait for 
> completion]
> Or do you mean suspension fail in general post bkacend is disconnected from
> frontend for blkfront? 

I don't think you strictly need to disconnect from the backend when
suspending. Just waiting for all requests to finish should be enough.

This has the benefit of not having to renegotiate if the suspension
fails, and thus you can recover from suspension faster in case of
failure. Since you haven't closed the connection with the backend just
unfreezing the queues should get you working again, and avoids all the

> In case of later, if anything fails after the dpm_suspend(),
> things need to be thawed or set back up so it should ok to always 
> re-negotitate just to avoid errors. 
> > > What I'm mostly worried about is the different approach to ring
> > > draining. Ie: either xenbus is changed to freeze the queues and drain
> > > the shared rings, or PM uses the already existing logic of not
> > > flushing the rings an re-issuing in-flight requests on resume.
> > > 
> > 
> > Yes, that's needs consideration. I don’t think the same semantic can be 
> > suitable for both. E.g. in a xen-suspend we need to freeze with as little 
> > processing as possible to avoid dirtying RAM late in the migration cycle, 
> > and we know that in-flight data can wait. But in a transition to S4 we need 
> > to make sure that at least all the in-flight blkif requests get completed, 
> > since they probably contain bits of the guest's memory image and that's not 
> > going to get saved any other way.
> > 
> >   Paul
> I agree with Paul here. Just so as you know, I did try a hacky way in the 
> past 
> to re-queue requests in the past and failed miserably.

Well, it works AFAIK for xenbus initiated suspension, so I would be
interested to know why it doesn't work with PM suspension.

> I doubt[just from my experimentation]re-queuing the requests will work for PM 
> Hibernation for the same reason Paul mentioned above unless you give me 
> pressing
> reason why it should work.

My main reason is that I don't want to maintain two different
approaches to suspend/resume without a technical argument for it. I'm
not happy to take a bunch of new code just because the current one
doesn't seem to work in your use-case.

That being said, if there's a justification for doing it differently
it needs to be stated clearly in the commit. From the current commit
message I didn't gasp that there was a reason for not using the
current xenbus suspend/resume logic.

> Also, won't it effect the migration time if we start waiting for all the
> inflight requests to complete[last min page faults] ?

Well, it's going to dirty pages that would have to be re-send to the
destination side.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.