[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
On Wed, Mar 27, 2019 at 08:40:20AM +0200, Oleksandr Andrushchenko wrote: > On 3/25/19 7:30 PM, Anchal Agarwal wrote: > >On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote: > >>On 3/20/19 5:50 AM, Munehisa Kamata wrote: > >>>On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote: > >>>>+Amazon > >>>>pls see inline > >>>Hi Oleksandr, > >>> > >>>Let me add some comments as the original author of the series. > >>Thank you for your work! > >Hi Oleksandr, > >>>>On 3/14/19 9:00 PM, Julien Grall wrote: > >>>>>Hi, > >>>>> > >>>>>On 3/14/19 3:40 PM, Boris Ostrovsky wrote: > >>>>>>On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote: > >>>>>>>On 3/14/19 5:02 PM, Boris Ostrovsky wrote: > >>>>>>>>On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote: > >>>>>>>>>On 3/14/19 4:47 PM, Boris Ostrovsky wrote: > >>>>>>>>>>On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote: > >>>>>>>>>>>From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> > >>>>>>>>>>> > >>>>>>>>>>>Currently on driver resume we remove all the network queues and > >>>>>>>>>>>destroy shared Tx/Rx rings leaving the driver in its current state > >>>>>>>>>>>and never signaling the backend of this frontend's state change. > >>>>>>>>>>>This leads to the number of consequences: > >>>>>>>>>>>- when frontend withdraws granted references to the rings etc. it > >>>>>>>>>>>cannot > >>>>>>>>>>> ???????? be cleanly done as the backend still holds those (it > >>>>>>>>>>> was not > >>>>>>>>>>>told to > >>>>>>>>>>> ???????? free the resources) > >>>>>>>>>>>- it is not possible to resume driver operation as all the > >>>>>>>>>>>communication > >>>>>>>>>>> ???????? means with the backned were destroyed by the frontend, > >>>>>>>>>>> thus > >>>>>>>>>>> ???????? making the frontend appear to the guest OS as > >>>>>>>>>>> functional, but > >>>>>>>>>>> ???????? not really. > >>>>>>>>>>What do you mean? Are you saying that after resume you lose > >>>>>>>>>>connectivity? > >>>>>>>>>Exactly, if you take a look at the .resume callback as it is now > >>>>>>>>>what it does it destroys the rings etc. and never notifies the > >>>>>>>>>backend > >>>>>>>>>of that, e.g. it stays in, say, connected state with communication > >>>>>>>>>channels destroyed. It never goes into any other Xen bus state, so > >>>>>>>>>there is > >>>>>>>>>no way its state machine can help recovering. > >>>>>>>>My tree is about a month old so perhaps there is some sort of > >>>>>>>>regression > >>>>>>>>but this certainly works for me. After resume netfront gets > >>>>>>>>XenbusStateInitWait from backend which causes xennet_connect(). > >>>>>>>Ah, the difference can be of the way we get the guest enter > >>>>>>>the suspend state. I am making my guest to suspend with: > >>>>>>>echo mem > /sys/power/state > >>>>>>>And then I use an interrupt to the guest (this is a test code) > >>>>>>>to wake it up. > >>>>>>>Could you please share your exact use-case when the guest enters > >>>>>>>suspend > >>>>>>>and what you do to resume it? > >>>>>>xl save / xl restore > >>>>>> > >>>>>>>I can see no way backend may want enter XenbusStateInitWait in my > >>>>>>>use-case > >>>>>>>as it simply doesn't know we want him to. > >>>>>>Yours looks like ACPI path, I don't know how well it was tested TBH. > >>>>>I remember a series from amazon [1] that plays around suspend and > >>>>>hibernation. The patch [2] leads me to think that guest triggered > >>>>>suspend/resume does not work properly. It looks like the series has > >>>>>never been fully reviewed. Not sure why... > >>>>Julien, thanks a lot for bringing these patches to our attention which we > >>>>obviously missed. > >>>>>Anyway, from my understanding this series may solve Oleksandr issue. > >>>>>However, this would only address the common code side. AFAIK Oleksandr > >>>>>is targeting Arm platform. If so, I think this would require more work > >>>>>than this series. Arm code still miss few bits properly suspend/resume > >>>>>arch specific code (see [2]). > >>>>> > >>>>>I have a branch on my git to track the series. However, they never have > >>>>>been resent after Ian Campbell left Citrix. I would be happy to review > >>>>>them if someone wants to pick them up and repost them. > >>>>> > >>>>First of all, let me make it clear that we are interested in hibernation > >>>>long term, so it would be > >>>>desirable to re-use as much work form resume/suspend as we can. But, we > >>>>see it as a step by > >>>>step work, e.g. first S2RAM and later on hibernation. > >>>>Let me clarify the immediate use-case that we have, so it is easier to > >>>>understand what we want > >>>>and what we don't at the moment. We are about to continue work started by > >>>>Mirela/Xilinx on > >>>>Suspend-to-RAM for ARM [3] and we made number of assumptions: > >>>>1. We are talking about *system* suspend, e.g. the goal is to suspend all > >>>>the components > >>>>of the system and Xen itself at once. Think about this as fast-boot > >>>>and/or energy saving > >>>>feature if you will. > >>>>2. With suspend/resume there is no intention to migrate VMs to any other > >>>>host. > >>>>3. Most probably configuration of the back/front won't change between > >>>>suspend/resume. > >>>>But long term we are also thinking for supporting suspend/resume in its > >>>>broader meaning, > >>>>e.g. what is probably what you mean by suspend/resume. > >>>AFAIK .suspend and .resume callbacks in frontend drivers are > >>>specifically for xl save/restore case rather than the normal "system" > >>>suspend. i.e. The former is Boris' case and something I called "Xen > >>>suspend" in the patch series, the latter should be your interest and > >>>called "ACPI path" here, and I referred to as "PM suspend". They are > >>>very different code paths, see drivers/xen/manage.c for details of > >>>Xen suspend. > >>Yes, I saw that code, thank you > >>>>Given that, we think that we don't need Xen support to save grants, page > >>>>tables and other > >>>>VM's context on suspend at least at the first stage as we are > >>>>implementing not a fully > >>>>blown suspend/resume, but only S2RAM part of it which is much more > >>>>simpler than a generic > >>>>suspend implementation. We only need changes to Linux kernel frontend > >>>>drivers from [1] - the > >>>>piece that we miss is suspend/resume implementation in the netfront > >>>>driver. What is more, as > >>>>we are not changing back/front configuration, we can even live with empty > >>>>.resume/.suspend > >>>>frontend's callbacks because event channels, rings etc. are "statically" > >>>>allocated in our > >>>>use-case at the first system start (cold boot). And indeed, tests show > >>>>that waking domains > >>>>in the right order do allow that. > >>>>So, frankly, from [3] we are immediately interested in implementing > >>>>.resume/.suspend, not > >>>If you just (re)implement .suspend and .resume so without taking care > >>>of Xen suspend, you can easily break the existing functionality. The > >>>patch series introduced .freeze and .restore callbacks for both PM > >>>suspend and hibernation, and kept .suspend (not implemented in most > >>>frontend though) and .resume with no changes for Xen suspend. > >>> > >>>Note that xenbus has mapped freeze/thaw/restore events to suspend, > >>>resume and cancel callbacks to handle "checkpoint" case[4]. This was a > >>>bit tricky and led me to the design to have the separate set of > >>>callbacks at each frontend driver level[5]. You might need to consider > >>>a similar approach even if your immediate interest at the moment is PM > >>>suspend. > >>For the immediate task we have at the moment we think we can re-use > >>your work and implement .suspend/.resume based on it (we are targeting > >>S2RAM as the first stage). > >>But long term - we do support the idea of fully implemented > >>suspend and *hibernate* functionality as you describe it. > >>So, yes, we are also thinking about that. > >>>>even freeze/thaw/restore callbacks: if Amazon has will and capacity to > >>>>continue working on [3] > >>>>then once that gets into the upstream it also solves our S2RAM use-case, > >>>>but if not then we > >>>>can probably re-work netfront patch and only provide .resume/.suspend > >>>>callbacks which we need > >>>>for now (remember our very specific use-case which can survive suspend > >>>>without callbacks > >>>>implemented). > >>>>IMO, patches at [2] seem to be useful while implementing generic > >>>>suspend/resume and can > >>>>be postponed for S2RAM. > >>>> > >>>>Julien/Juergen/Boris/Amazon - could you please express your view on the > >>>>above? > >>>>Is it acceptable that for now we only take re-worked netfront patch from > >>>>[3] with full > >>>>implementation in mind for later (we reuse code for .resume/.suspend)? > >>>In fact, Anchal has taken over my initial work and she may want to chime > >>>in here. > >>Great, could you please let us know what is the progress and further plans > >>on that, so we do not work on the same code and can coordinate our > >>efforts somehow? Anchal, could you please shed some light on this? > >Looks like my previous email did not make it to mailing list. May be some > >issues with my > >email server settings. Giving it another shot. > >Yes, I am working on those patches and plan to re-post them in an effort to > >upstream. > This is really great, looking forward to it: any date in your mind > when this can happen? Not a specific date but may be in few weeks. I am currently swamped at work. > >I agree with Munehisa here on considering the patches that are already out > >there as > >I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION > >from xen > >suspend and resume. There may be minor fixes here and there however, the > >overall > >idea will still remain the same. > Ok, so I'll plan my efforts accordingly > > As the previous patches there will be support for > >only xen-blkfront and xen-netfront in the initial patchset. > >>>That said, I'd be very happy to review patches if you come up with your > >>>own ones, so feel free to add me in that case. > >>Sure, thank you! > >>>>>Cheers, > >>>>> > >>>>>[1] > >>>>>https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html > >>>>> > >>>>>[2] > >>>>>http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2 > >>>>> > >>>>[3] > >>>>https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html > >>>[4] > >>>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b > >>>[5] > >>>https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html > >>> > >>>>>>-boris > >>>>>> > >>>>>>_______________________________________________ > >>>>>>Xen-devel mailing list > >>>>>>Xen-devel@xxxxxxxxxxxxxxxxxxxx > >>>>>>https://lists.xenproject.org/mailman/listinfo/xen-devel > >>>>>> > >>>>Thank you, > >>>>Oleksandr > >>>Thanks, > >>>Munehisa > >Thanks, > >Anchal > Thank you, > Oleksandr Thanks, Anchal _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |