[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]

To: Anchal Agarwal <anchalag@xxxxxxxxxx>
From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Date: Tue, 30 Jun 2020 10:30:06 +0200
Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
Cc: "Valentin, Eduardo" <eduval@xxxxxxxxxx>, "len.brown@xxxxxxxxx" <len.brown@xxxxxxxxx>, "peterz@xxxxxxxxxxxxx" <peterz@xxxxxxxxxxxxx>, "benh@xxxxxxxxxxxxxxxxxxx" <benh@xxxxxxxxxxxxxxxxxxx>, "x86@xxxxxxxxxx" <x86@xxxxxxxxxx>, "linux-mm@xxxxxxxxx" <linux-mm@xxxxxxxxx>, "pavel@xxxxxx" <pavel@xxxxxx>, "hpa@xxxxxxxxx" <hpa@xxxxxxxxx>, "tglx@xxxxxxxxxxxxx" <tglx@xxxxxxxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "Kamata, Munehisa" <kamatam@xxxxxxxxxx>, "mingo@xxxxxxxxxx" <mingo@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Singh, Balbir" <sblbir@xxxxxxxxxx>, "axboe@xxxxxxxxx" <axboe@xxxxxxxxx>, "konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx>, "bp@xxxxxxxxx" <bp@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, "jgross@xxxxxxxx" <jgross@xxxxxxxx>, "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>, "linux-pm@xxxxxxxxxxxxxxx" <linux-pm@xxxxxxxxxxxxxxx>, "rjw@xxxxxxxxxxxxx" <rjw@xxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "vkuznets@xxxxxxxxxx" <vkuznets@xxxxxxxxxx>, "davem@xxxxxxxxxxxxx" <davem@xxxxxxxxxxxxx>, "Woodhouse, David" <dwmw@xxxxxxxxxxxx>
Delivery-date: Tue, 30 Jun 2020 08:30:38 +0000
Ironport-sdr: vJWCbZ24qNBCnRa1jeMfzFihapPAEXWOb/MH/cTYYQUyip68MUIwSFss/OBbvXPJYIxsyk1cMP fykjYjWPPolzXUfAKEKbi8v5e6uTYPGHa6se6prvblrrjZ315fDhXE1Ldp+PwYlfxKFSuTxQzJ fPz8b/Ptrkeq2wV7Rn+c0mEizLc08zRTfpE0PDc/3s/85ulln/xVa/ZpKNcjFMEisNHCGoh3Sk yQnP6MGeMNavHGNxNXThtwWpJyX4Yohpdvq55ZlG0FcG1zCwqQ5tX3XMLY0i4aQJXKK1yVMij0 oLM=
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Jun 29, 2020 at 07:20:35PM +0000, Anchal Agarwal wrote:
> On Fri, Jun 26, 2020 at 11:12:39AM +0200, Roger Pau Monné wrote:
> > So the frontend should do:
> > 
> > - Switch to Closed state (and cleanup everything required).
> > - Wait for backend to switch to Closed state (must be done
> >   asynchronously, handled in blkback_changed).
> > - Switch frontend to XenbusStateInitialising, that will in turn force
> >   the backend to switch to XenbusStateInitWait.
> > - After that it should just follow the normal connection procedure.
> > 
> > I think the part that's missing is the frontend doing the state change
> > to XenbusStateInitialising when the backend switches to the Closed
> > state.
> > 
> > > I was of the view we may just want to mark frontend closed which should do
> > > the job of freeing resources and then following the same flow as
> > > blkfront_restore. That does not seems to work correctly 100% of the time.
> > 
> > I think the missing part is that you must wait for the backend to
> > switch to the Closed state, or else the switch to
> > XenbusStateInitialising won't be picked up correctly by the backend
> > (because it's still doing it's cleanup).
> > 
> > Using blkfront_restore might be an option, but you need to assert the
> > backend is in the initial state before using that path.
> >
> Yes, I agree and I make sure that XenbusStateInitialising only triggers
> on frontend once backend is disconnected. msleep in a loop not that graceful 
> but
> works.
> Frontend only switches to XenbusStateInitialising once it sees backend
> as Closed. The issue here is and may require more debugging is:
> 1. Hibernate instance->Closing failed, artificially created situation by not
> marking frontend Closed in the first place during freezing.
> 2. System comes back up fine restored to 'backend connected'.

I'm not sure I'm following what is happening here, what should happen
IMO is that the backend will eventually reach the Closed state? Ie:
the frontend has initiated the disconnection from the backend by
setting the Closing state, and the backend will have to eventually
reach the Closed state.

At that point the frontend can initiate a reconnection by switching to
the Initialising state.

> 3. Re-run (1) again without reboot
> 4. (4) fails to recover basically freezing does not fail at all which is weird
>    because it should timeout as it passes through same path. It hits a BUG in
>    talk_to_blkback() and instance crashes.

It's hard to tell exactly. I guess you would have to figure what makes
the frontend not get stuck at the same place as the first attempt.

Roger.

References:
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Roger Pau Monné
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Anchal Agarwal
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Roger Pau Monné
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Anchal Agarwal
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Roger Pau Monné
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Anchal Agarwal
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Roger Pau Monné
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Anchal Agarwal
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Roger Pau Monné
- Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  - From: Anchal Agarwal

Prev by Date: Re: [PATCH 1/2] xen/displif: Protocol version 2
Next by Date: [xen-unstable test] 151461: tolerable FAIL - PUSHED
Previous by thread: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
Next by thread: [xtf test] 150667: all pass - PUSHED
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.