|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
Hello,
On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> > From: Munehisa Kamata <kamatam@xxxxxxxxxx>
> >
> > S4 power transition states are much different than xen
> > suspend/resume. Former is visible to the guest and frontend drivers
> should
> > be aware of the state transitions and should be able to take appropriate
> > actions when needed. In transition to S4 we need to make sure that at
> least
> > all the in-flight blkif requests get completed, since they probably
> contain
> > bits of the guest's memory image and that's not going to get saved any
> > other way. Hence, re-issuing of in-flight requests as in case of xen
> resume
> > will not work here. This is in contrast to xen-suspend where we need to
> > freeze with as little processing as possible to avoid dirtying RAM late
> in
> > the migration cycle and we know that in-flight data can wait.
> >
> > Add freeze, thaw and restore callbacks for PM suspend and hibernation
> > support. All frontend drivers that needs to use
> PM_HIBERNATION/PM_SUSPEND
> > events, need to implement these xenbus_driver callbacks. The freeze
> handler
> > stops block-layer queue and disconnect the frontend from the backend
> while
> > freeing ring_info and associated resources. Before disconnecting from
> the
> > backend, we need to prevent any new IO from being queued and wait for
> existing
> > IO to complete. Freeze/unfreeze of the queues will guarantee that there
> are no
> > requests in use on the shared ring. However, for sanity we should check
> > state of the ring before disconnecting to make sure that there are no
> > outstanding requests to be processed on the ring. The restore handler
> > re-allocates ring_info, unquiesces and unfreezes the queue and
> re-connect to
> > the backend, so that rest of the kernel can continue to use the block
> device
> > transparently.
> >
> > Note:For older backends,if a backend doesn't have commit'12ea729645ace'
> > xen/blkback: unmap all persistent grants when frontend gets
> disconnected,
> > the frontend may see massive amount of grant table warning when freeing
> > resources.
> > [ 36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> > [ 36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
> >
> > In this case, persistent grants would need to be disabled.
> >
> > [Anchal Changelog: Removed timeout/request during blkfront freeze.
> > Reworked the whole patch to work with blk-mq and incorporate upstream's
> > comments]
>
> Please tag versions using vX and it would be helpful if you could list
> the specific changes that you performed between versions. There where
> 3 RFC versions IIRC, and there's no log of the changes between them.
>
> I will elaborate on "upstream's comments" in my changelog in my next round of
> patches.
Sorry for being picky, but can you please make sure your email client
properly quotes previous emails on reply. Note the lack of '>' added
to the quoted parts of your reply.
> > + }
> > +
> > break;
> > + }
> > +
> > + /*
> > + * We may somehow receive backend's Closed again while
> thawing
> > + * or restoring and it causes thawing or restoring to
> fail.
> > + * Ignore such unexpected state regardless of the backend
> state.
> > + */
> > + if (info->connected == BLKIF_STATE_FROZEN) {
>
> I think you can join this with the previous dev->state ==
> XenbusStateClosed?
>
> Also, won't the device be in the Closed state already if it's in state
> frozen?
> Yes but I think this mostly due to a hypothetical case if during thawing
> backend switches to Closed state.
> I am not entirely sure if that could happen. Could use some expertise here.
I think the frontend seeing the backend in the closed state during
restore would be a bug that should prevent the frontend from
resuming.
> > + /* Kick the backend to disconnect */
> > + xenbus_switch_state(dev, XenbusStateClosing);
> > +
> > + /*
> > + * We don't want to move forward before the frontend is
> diconnected
> > + * from the backend cleanly.
> > + */
> > + timeout =
> wait_for_completion_timeout(&info->wait_backend_disconnected,
> > + timeout);
> > + if (!timeout) {
> > + err = -EBUSY;
>
> Note err is only used here, and I think could just be dropped.
>
> This err is what's being returned from the function. Am I missing anything?
Just 'return -EBUSY;' directly, and remove the top level variable. You
can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not
that important.
> > + xenbus_dev_error(dev, err, "Freezing timed out;"
> > + "the device may become inconsistent
> state");
>
> Leaving the device in this state is quite bad, as it's in a closed
> state and with the queues frozen. You should make an attempt to
> restore things to a working state.
>
> You mean if backend closed after timeout? Is there a way to know that? I
> understand it's not good to
> leave it in this state however, I am still trying to find if there is a good
> way to know if backend is still connected after timeout.
> Hence the message " the device may become inconsistent state". I didn't see
> a timeout not even once on my end so that's why
> I may be looking for an alternate perspective here. may be need to thaw
> everything back intentionally is one thing I could think of.
You can manually force this state, and then check that it will behave
correctly. I would expect that on a failure to disconnect from the
backend you should switch the frontend to the 'Init' state in order to
try to reconnect to the backend when possible.
> > + }
> > +
> > + return err;
> > +}
> > +
> > +static int blkfront_restore(struct xenbus_device *dev)
> > +{
> > + struct blkfront_info *info = dev_get_drvdata(&dev->dev);
> > + int err = 0;
> > +
> > + err = talk_to_blkback(dev, info);
> > + blk_mq_unquiesce_queue(info->rq);
> > + blk_mq_unfreeze_queue(info->rq);
> > + if (!err)
> > + blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
>
> Bad indentation. Also shouldn't you first update the queues and then
> unfreeze them?
> Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes the queue
> So I don't think the order could be reversed.
Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's
correct to unfreeze the queues without having updated them. Also the
freezing/unfreezing uses a refcount, so I think it's perfectly fine to
call blk_mq_update_nr_hw_queues first and then unfreeze the queues.
Also note that talk_to_blkback returning an error should likely
prevent any unfreezing, as the queues won't be updated to match the
parameters of the backend.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |