Xen project Mailing List

Re: [Xen-devel] [PATCH] xen/arm: introduce vwfi parameter

On Mon, 20 Feb 2017, George Dunlap wrote: > On 20/02/17 18:43, Stefano Stabellini wrote: > > On Mon, 20 Feb 2017, Dario Faggioli wrote: > >> On Sun, 2017-02-19 at 21:34 +0000, Julien Grall wrote: > >>> Hi Stefano, > >>> > >>> I have CCed another ARM person who has more knowledge than me on > >>> scheduling/power. > >>> > >> Ah, when I saw this, I thought you were Cc-ing my friend Juri, which > >> also works there, and is doing that stuff. :-) > >> > >>>> In both cases the vcpus is not run until the next slot, so I don't > >>>> think > >>>> it should make the performance worse in multi-vcpus scenarios. But > >>>> I can > >>>> do some tests to double check. > >>> > >>> Looking at your answer, I think it would be important that everyone > >>> in > >>> this thread understand the purpose of WFI and how it differs with > >>> WFE. > >>> > >>> The two instructions provides a way to tell the processor to go in > >>> low-power state. It means the processor can turn off power on some > >>> parts > >>> (e.g unit, pipeline...) to save energy. > >>> > >> [snip] > >>> > >>> For both instruction it is normal to have an higher latency when > >>> receiving an interrupt. When a software is using them, it knows that > >>> there will have an impact, but overall it will expect some power to > >>> be > >>> saved. Whether the current numbers are acceptable is another > >>> question. > >>> > >> Ok, thanks for these useful information. I think I understand the idea > >> behind these two instructions/mechanisms now. > >> > >> What (I think) Stefano is proposing is providing the user (of Xen on > >> ARM) with a way of making them behave differently. > > > > That's right. It's not always feasible to change the code of the guest > > the user is running. Maybe she cannot, or maybe she doesn't want to for > > other reasons. Keep in mind that the developer of the operating system > > in this example might have had very different expectations of irq > > latency, given that, even with wfi, is much lower on native. > > > > When irq latency is way more important than power consumption to the > > user (think of a train, or an industrial machine that needs to move > > something in a given amount of time), this option provides value to her > > at very little maintenance cost on our side. > > > > Of course, even if we introduce this option, by no mean we should stop > > improving the irq latency in the normal cases. > > > > > >> Whether good or bad, I've expressed my thoughts, and it's your call in > >> the end. :-) > >> George also has a fair point, though. Using yield is a quick and *most > >> likely* effective way of achieving Linux's "idle=poll", but at the same > >> time, a rather rather risky one, as it basically means the final > >> behavior would relay on how yield() behave on the specific scheduler > >> the user is using, which may vary. > >> > >>> Now, regarding what you said. Let's imagine the scheduler is > >>> descheduling the vCPU until the next slot, it will run the vCPU > >>> after > >>> even if no interrupt has been received. > >>> > >> There really are no slots. There sort of are in Credit1, but preemption > >> can happen inside a "slot", so I wouldn't call them such in there too. > >> > >>> This is a real waste of power > >>> and become worst if an interrupt is not coming for multiple slot. > >>> > >> Undeniable. :-) > > > > Of course. But if your app needs less than 3000ns of latency, then it's > > the only choice. > > > > > >>> In the case of multi-vcpu, the guest using wfi will use more slot > >>> than > >>> it was doing before. This means less slot for vCPUs that actually > >>> have > >>> real work to do. > >>> > >> No, because it continuously yields. So, yes indeed there will be higher > >> scheduling overhead, but no stealing of otherwise useful computation > >> time. Not with the yield() implementations we have right now in the > >> code. > >> > >> But I'm starting to think that we probably better make a step back from > >> withing deep inside the scheduler, and think, first, whether or not > >> having something similar to Linux's idle=poll is something we want, if > >> only for testing, debugging, or very specific use cases. > >> > >> And only then, if the answer is yes, decide how to actually implement > >> it, whether or not to use yield, etc. > > > > I think we want it, if the implementation is small and unintrusive. > > But surely we want it to be per-domain, not system-wide? Yes, per-domain would be ideal, but I thought that system-wide would be good enough. I admit I got lazy :-) I can plumb it through libxl/xl if that's the consensus. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.