[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback



On Wed, Mar 27, 2019 at 08:40:20AM +0200, Oleksandr Andrushchenko wrote:
> On 3/25/19 7:30 PM, Anchal Agarwal wrote:
> >On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
> >>On 3/20/19 5:50 AM, Munehisa Kamata wrote:
> >>>On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> >>>>+Amazon
> >>>>pls see inline
> >>>Hi Oleksandr,
> >>>
> >>>Let me add some comments as the original author of the series.
> >>Thank you for your work!
> >Hi Oleksandr,
> >>>>On 3/14/19 9:00 PM, Julien Grall wrote:
> >>>>>Hi,
> >>>>>
> >>>>>On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> >>>>>>On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> >>>>>>>>On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>>>On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> >>>>>>>>>>On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>>>>>From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> >>>>>>>>>>>
> >>>>>>>>>>>Currently on driver resume we remove all the network queues and
> >>>>>>>>>>>destroy shared Tx/Rx rings leaving the driver in its current state
> >>>>>>>>>>>and never signaling the backend of this frontend's state change.
> >>>>>>>>>>>This leads to the number of consequences:
> >>>>>>>>>>>- when frontend withdraws granted references to the rings etc. it
> >>>>>>>>>>>cannot
> >>>>>>>>>>>  ???????? be cleanly done as the backend still holds those (it 
> >>>>>>>>>>> was not
> >>>>>>>>>>>told to
> >>>>>>>>>>>  ???????? free the resources)
> >>>>>>>>>>>- it is not possible to resume driver operation as all the
> >>>>>>>>>>>communication
> >>>>>>>>>>>  ???????? means with the backned were destroyed by the frontend, 
> >>>>>>>>>>> thus
> >>>>>>>>>>>  ???????? making the frontend appear to the guest OS as 
> >>>>>>>>>>> functional, but
> >>>>>>>>>>>  ???????? not really.
> >>>>>>>>>>What do you mean? Are you saying that after resume you lose
> >>>>>>>>>>connectivity?
> >>>>>>>>>Exactly, if you take a look at the .resume callback as it is now
> >>>>>>>>>what it does it destroys the rings etc. and never notifies the 
> >>>>>>>>>backend
> >>>>>>>>>of that, e.g. it stays in, say, connected state with communication
> >>>>>>>>>channels destroyed. It never goes into any other Xen bus state, so
> >>>>>>>>>there is
> >>>>>>>>>no way its state machine can help recovering.
> >>>>>>>>My tree is about a month old so perhaps there is some sort of 
> >>>>>>>>regression
> >>>>>>>>but this certainly works for me. After resume netfront gets
> >>>>>>>>XenbusStateInitWait from backend which causes xennet_connect().
> >>>>>>>Ah, the difference can be of the way we get the guest enter
> >>>>>>>the suspend state. I am making my guest to suspend with:
> >>>>>>>echo mem > /sys/power/state
> >>>>>>>And then I use an interrupt to the guest (this is a test code)
> >>>>>>>to wake it up.
> >>>>>>>Could you please share your exact use-case when the guest enters 
> >>>>>>>suspend
> >>>>>>>and what you do to resume it?
> >>>>>>xl save / xl restore
> >>>>>>
> >>>>>>>I can see no way backend may want enter XenbusStateInitWait in my
> >>>>>>>use-case
> >>>>>>>as it simply doesn't know we want him to.
> >>>>>>Yours looks like ACPI path, I don't know how well it was tested TBH.
> >>>>>I remember a series from amazon [1] that plays around suspend and 
> >>>>>hibernation. The patch [2] leads me to think that guest triggered 
> >>>>>suspend/resume does not work properly. It looks like the series has 
> >>>>>never been fully reviewed. Not sure why...
> >>>>Julien, thanks a lot for bringing these patches to our attention which we 
> >>>>obviously missed.
> >>>>>Anyway, from my understanding this series may solve Oleksandr issue. 
> >>>>>However, this would only address the common code side. AFAIK Oleksandr 
> >>>>>is targeting Arm platform. If so, I think this would require more work 
> >>>>>than this series. Arm code still miss few bits properly suspend/resume 
> >>>>>arch specific code (see [2]).
> >>>>>
> >>>>>I have a branch on my git to track the series. However, they never have 
> >>>>>been resent after Ian Campbell left Citrix. I would be happy to review 
> >>>>>them if someone wants to pick them up and repost them.
> >>>>>
> >>>>First of all, let me make it clear that we are interested in hibernation 
> >>>>long term, so it would be
> >>>>desirable to re-use as much work form resume/suspend as we can. But, we 
> >>>>see it as a step by
> >>>>step work, e.g. first S2RAM and later on hibernation.
> >>>>Let me clarify the immediate use-case that we have, so it is easier to 
> >>>>understand what we want
> >>>>and what we don't at the moment. We are about to continue work started by 
> >>>>Mirela/Xilinx on
> >>>>Suspend-to-RAM for ARM [3] and we made number of assumptions:
> >>>>1. We are talking about *system* suspend, e.g. the goal is to suspend all 
> >>>>the components
> >>>>of the system and Xen itself at once. Think about this as fast-boot 
> >>>>and/or energy saving
> >>>>feature if you will.
> >>>>2. With suspend/resume there is no intention to migrate VMs to any other 
> >>>>host.
> >>>>3. Most probably configuration of the back/front won't change between 
> >>>>suspend/resume.
> >>>>But long term we are also thinking for supporting suspend/resume in its 
> >>>>broader meaning,
> >>>>e.g. what is probably what you mean by suspend/resume.
> >>>AFAIK .suspend and .resume callbacks in frontend drivers are
> >>>specifically for xl save/restore case rather than the normal "system"
> >>>suspend. i.e. The former is Boris' case and something I called "Xen
> >>>suspend" in the patch series, the latter should be your interest and
> >>>called "ACPI path" here, and I referred to as "PM suspend". They are
> >>>very different code paths, see drivers/xen/manage.c for details of
> >>>Xen suspend.
> >>Yes, I saw that code, thank you
> >>>>Given that, we think that we don't need Xen support to save grants, page 
> >>>>tables and other
> >>>>VM's context on suspend at least at the first stage as we are 
> >>>>implementing not a fully
> >>>>blown suspend/resume, but only S2RAM part of it which is much more 
> >>>>simpler than a generic
> >>>>suspend implementation. We only need changes to Linux kernel frontend 
> >>>>drivers from [1] - the
> >>>>piece that we miss is suspend/resume implementation in the netfront 
> >>>>driver. What is more, as
> >>>>we are not changing back/front configuration, we can even live with empty 
> >>>>.resume/.suspend
> >>>>frontend's callbacks because event channels, rings etc. are "statically" 
> >>>>allocated in our
> >>>>use-case at the first system start (cold boot). And indeed, tests show 
> >>>>that waking domains
> >>>>in the right order do allow that.
> >>>>So, frankly, from [3] we are immediately interested in implementing 
> >>>>.resume/.suspend, not
> >>>If you just (re)implement .suspend and .resume so without taking care
> >>>of Xen suspend, you can easily break the existing functionality. The
> >>>patch series introduced .freeze and .restore callbacks for both PM
> >>>suspend and hibernation, and kept .suspend (not implemented in most
> >>>frontend though) and .resume with no changes for Xen suspend.
> >>>
> >>>Note that xenbus has mapped freeze/thaw/restore events to suspend,
> >>>resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> >>>bit tricky and led me to the design to have the separate set of
> >>>callbacks at each frontend driver level[5]. You might need to consider
> >>>a similar approach even if your immediate interest at the moment is PM
> >>>suspend.
> >>For the immediate task we have at the moment we think we can re-use
> >>your work and implement .suspend/.resume based on it (we are targeting
> >>S2RAM as the first stage).
> >>But long term - we do support the idea of fully implemented
> >>suspend and *hibernate* functionality as you describe it.
> >>So, yes, we are also thinking about that.
> >>>>even freeze/thaw/restore callbacks: if Amazon has will and capacity to 
> >>>>continue working on [3]
> >>>>then once that gets into the upstream it also solves our S2RAM use-case, 
> >>>>but if not then we
> >>>>can probably re-work netfront patch and only provide .resume/.suspend 
> >>>>callbacks which we need
> >>>>for now (remember our very specific use-case which can survive suspend 
> >>>>without callbacks
> >>>>implemented).
> >>>>IMO, patches at [2] seem to be useful while implementing generic 
> >>>>suspend/resume and can
> >>>>be postponed for S2RAM.
> >>>>
> >>>>Julien/Juergen/Boris/Amazon - could you please express your view on the 
> >>>>above?
> >>>>Is it acceptable that for now we only take re-worked netfront patch from 
> >>>>[3] with full
> >>>>implementation in mind for later (we reuse code for .resume/.suspend)?
> >>>In fact, Anchal has taken over my initial work and she may want to chime
> >>>in here.
> >>Great, could you please let us know what is the progress and further plans
> >>on that, so we do not work on the same code and can coordinate our
> >>efforts somehow? Anchal, could you please shed some light on this?
> >Looks like my previous email did not make it to mailing list. May be some 
> >issues with my
> >email server settings. Giving it another shot.
> >Yes, I am working on those patches and plan to re-post them in an effort to 
> >upstream.
> This is really great, looking forward to it: any date in your mind
> when this can happen?
Not a specific date but may be in few weeks. I am currently swamped at work.
> >I agree with Munehisa here on considering the patches that are already out 
> >there as
> >I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION 
> >from xen
> >suspend and resume. There may be minor fixes here and there however, the 
> >overall
> >idea will still remain the same.
> Ok, so I'll plan my efforts accordingly
> >  As the previous patches there will be support for
> >only xen-blkfront and xen-netfront in the initial patchset.
> >>>That said, I'd be very happy to review patches if you come up with your
> >>>own ones, so feel free to add me in that case.
> >>Sure, thank you!
> >>>>>Cheers,
> >>>>>
> >>>>>[1] 
> >>>>>https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
> >>>>>
> >>>>>[2] 
> >>>>>http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
> >>>>>
> >>>>[3] 
> >>>>https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> >>>[4] 
> >>>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> >>>[5] 
> >>>https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
> >>>
> >>>>>>-boris
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>Xen-devel mailing list
> >>>>>>Xen-devel@xxxxxxxxxxxxxxxxxxxx
> >>>>>>https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>>>>>
> >>>>Thank you,
> >>>>Oleksandr
> >>>Thanks,
> >>>Munehisa
> >Thanks,
> >Anchal
> Thank you,
> Oleksandr

Thanks,
Anchal

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.