[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback



On Tue, Mar 19, 2019 at 08:50:05PM -0700, Munehisa Kamata wrote:
> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> > +Amazon
> > pls see inline
> Hi Oleksandr,
> 
> Let me add some comments as the original author of the series.
> 
> > 
> > On 3/14/19 9:00 PM, Julien Grall wrote:
> >> Hi,
> >>
> >> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> >>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> >>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> >>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> >>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> >>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >>>>>>>>
> >>>>>>>> Currently on driver resume we remove all the network queues and
> >>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
> >>>>>>>> and never signaling the backend of this frontend's state change.
> >>>>>>>> This leads to the number of consequences:
> >>>>>>>> - when frontend withdraws granted references to the rings etc. it
> >>>>>>>> cannot
> >>>>>>>> ???????? be cleanly done as the backend still holds those (it was not
> >>>>>>>> told to
> >>>>>>>> ???????? free the resources)
> >>>>>>>> - it is not possible to resume driver operation as all the
> >>>>>>>> communication
> >>>>>>>> ???????? means with the backned were destroyed by the frontend, thus
> >>>>>>>> ???????? making the frontend appear to the guest OS as functional, 
> >>>>>>>> but
> >>>>>>>> ???????? not really.
> >>>>>>> What do you mean? Are you saying that after resume you lose
> >>>>>>> connectivity?
> >>>>>> Exactly, if you take a look at the .resume callback as it is now
> >>>>>> what it does it destroys the rings etc. and never notifies the backend
> >>>>>> of that, e.g. it stays in, say, connected state with communication
> >>>>>> channels destroyed. It never goes into any other Xen bus state, so
> >>>>>> there is
> >>>>>> no way its state machine can help recovering.
> >>>>>
> >>>>> My tree is about a month old so perhaps there is some sort of regression
> >>>>> but this certainly works for me. After resume netfront gets
> >>>>> XenbusStateInitWait from backend which causes xennet_connect().
> >>>> Ah, the difference can be of the way we get the guest enter
> >>>> the suspend state. I am making my guest to suspend with:
> >>>> echo mem > /sys/power/state
> >>>> And then I use an interrupt to the guest (this is a test code)
> >>>> to wake it up.
> >>>> Could you please share your exact use-case when the guest enters suspend
> >>>> and what you do to resume it?
> >>>
> >>>
> >>> xl save / xl restore
> >>>
> >>>> I can see no way backend may want enter XenbusStateInitWait in my
> >>>> use-case
> >>>> as it simply doesn't know we want him to.
> >>>
> >>>
> >>> Yours looks like ACPI path, I don't know how well it was tested TBH.
> >>
> >> I remember a series from amazon [1] that plays around suspend and 
> >> hibernation. The patch [2] leads me to think that guest triggered 
> >> suspend/resume does not work properly. It looks like the series has never 
> >> been fully reviewed. Not sure why...
> > Julien, thanks a lot for bringing these patches to our attention which we 
> > obviously missed.
> >>
> >> Anyway, from my understanding this series may solve Oleksandr issue. 
> >> However, this would only address the common code side. AFAIK Oleksandr is 
> >> targeting Arm platform. If so, I think this would require more work than 
> >> this series. Arm code still miss few bits properly suspend/resume arch 
> >> specific code (see [2]).
> >>
> >> I have a branch on my git to track the series. However, they never have 
> >> been resent after Ian Campbell left Citrix. I would be happy to review 
> >> them if someone wants to pick them up and repost them.
> >>
> > First of all, let me make it clear that we are interested in hibernation 
> > long term, so it would be
> > desirable to re-use as much work form resume/suspend as we can. But, we see 
> > it as a step by
> > step work, e.g. first S2RAM and later on hibernation.
> > Let me clarify the immediate use-case that we have, so it is easier to 
> > understand what we want
> > and what we don't at the moment. We are about to continue work started by 
> > Mirela/Xilinx on
> > Suspend-to-RAM for ARM [3] and we made number of assumptions:
> > 1. We are talking about *system* suspend, e.g. the goal is to suspend all 
> > the components
> > of the system and Xen itself at once. Think about this as fast-boot and/or 
> > energy saving
> > feature if you will.
> > 2. With suspend/resume there is no intention to migrate VMs to any other 
> > host.
> > 3. Most probably configuration of the back/front won't change between 
> > suspend/resume.
> > But long term we are also thinking for supporting suspend/resume in its 
> > broader meaning,
> > e.g. what is probably what you mean by suspend/resume.
> AFAIK .suspend and .resume callbacks in frontend drivers are
> specifically for xl save/restore case rather than the normal "system"
> suspend. i.e. The former is Boris' case and something I called "Xen
> suspend" in the patch series, the latter should be your interest and
> called "ACPI path" here, and I referred to as "PM suspend". They are
> very different code paths, see drivers/xen/manage.c for details of
> Xen suspend.
> 
> > Given that, we think that we don't need Xen support to save grants, page 
> > tables and other
> > VM's context on suspend at least at the first stage as we are implementing 
> > not a fully
> > blown suspend/resume, but only S2RAM part of it which is much more simpler 
> > than a generic
> > suspend implementation. We only need changes to Linux kernel frontend 
> > drivers from [1] - the
> > piece that we miss is suspend/resume implementation in the netfront driver. 
> > What is more, as
> > we are not changing back/front configuration, we can even live with empty 
> > .resume/.suspend
> > frontend's callbacks because event channels, rings etc. are "statically" 
> > allocated in our
> > use-case at the first system start (cold boot). And indeed, tests show that 
> > waking domains
> > in the right order do allow that.
> > So, frankly, from [3] we are immediately interested in implementing 
> > .resume/.suspend, not
> If you just (re)implement .suspend and .resume so without taking care
> of Xen suspend, you can easily break the existing functionality. The
> patch series introduced .freeze and .restore callbacks for both PM
> suspend and hibernation, and kept .suspend (not implemented in most
> frontend though) and .resume with no changes for Xen suspend.
> 
> Note that xenbus has mapped freeze/thaw/restore events to suspend,
> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> bit tricky and led me to the design to have the separate set of
> callbacks at each frontend driver level[5]. You might need to consider
> a similar approach even if your immediate interest at the moment is PM
> suspend.
> 
> > even freeze/thaw/restore callbacks: if Amazon has will and capacity to 
> > continue working on [3]
> > then once that gets into the upstream it also solves our S2RAM use-case, 
> > but if not then we
> > can probably re-work netfront patch and only provide .resume/.suspend 
> > callbacks which we need
> > for now (remember our very specific use-case which can survive suspend 
> > without callbacks
> > implemented).
> > IMO, patches at [2] seem to be useful while implementing generic 
> > suspend/resume and can
> > be postponed for S2RAM.
> > 
> > Julien/Juergen/Boris/Amazon - could you please express your view on the 
> > above?
> > Is it acceptable that for now we only take re-worked netfront patch from 
> > [3] with full
> > implementation in mind for later (we reuse code for .resume/.suspend)?
> In fact, Anchal has taken over my initial work and she may want to chime
> in here.
> 
> That said, I'd be very happy to review patches if you come up with your
> own ones, so feel free to add me in that case.
Yes I am working on those patches and plan to re-post them in an effort to 
upstream the
the patches. I agree with Munehisa here on considering the patches that are 
already out
there as I plan to keep the same model to distinguish PM SUSPEND and PM 
HIBERNATION 
separate from xen suspend and resume. 
> 
> >> Cheers,
> >>
> >> [1] 
> >> https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
> >>
> >> [2] 
> >> http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
> >>
> > [3] 
> > https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> 
> [4] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
> 
> >>>
> >>>
> >>> -boris
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> >>> https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>>
> >>
> > Thank you,
> > Oleksandr
> 
> Thanks,
> Munehisa

Thanks,
Anchal

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.