Re: [XenPPC] Profiling in xen – ppc considerations

Jimi Xenidis wrote:

On Mar 14, 2007, at 6:17 AM, Christian Ehrhardt wrote:
Jimi Xenidis wrote:
Christian, nice summary.
One question I have is does Xen allow the domain to extract domainoprofile information as linxu would without Xen, or does xen allowfor some transport that eases the collection?
Short Answer: "some transport "- Shared buffers (Xen/Guests) + xenvirtual interrupts which are received in the linux oprofile driverfor xenLong Answer: Every explanation would be almost identical tohttp://xenoprof.sourceforge.net/xenoprof_2.0.txt so I recommend toread this because section one explains the background mechanisms.
Ahh, ok.
So active domains collect their own information, and Xen collects theinformation for the passive domains?

Yes, Active Domains collect their own information. In the group of theactive domains is one that is treated as primary domain (the one thatstarts/coodrinates/stops all via the oprofile userspace tools). ThisPrimary domain collects the samples for the passive domains that aresampled. So we have the following things in action:Xen : sets up/handles performance interrupts, delegate to domains viavirtual interrupt

Active Domains: collect their sample data passed via virtual interrupt

One of the active Domain is the primary one: master "controller" andcollecting passive domain samples additionallyPassive Domains: Do nothing because they are not aware whats going on,btw paranoid people consider this as a security breach

My thought was that a hypervisor could ease the burden of datacollection by providing a communication buffer and collect theinformation "out of body", especially information about low-leveloperations, but I guess they did not do that.
Anyway.. Lets stick with oprofiling the domains only.
There really very little Linux work to do here.  We need:
 1. An hcall that turn the performance monitor on for the domain
2. Save and restore the relevant registers for any domain that ishas it turned on.
 3. Turn it off for domains that have it disabled.
So you suggest to freeze counters while hypervisor or other domainsare running via "Performance Monitor Mark" MSR(PMM) and "Freezecounters while Mark" MMCR0(FCM0/1) - combined with keeping theinterrupt handler in the linux code ?
Again, sticking with oprofiling "active" domains only.
Yes, so MMCR0[FCH]=1 (freeze counters while MSR[HV]=1) always.
Then whenever we perform a domain switch (context_switch()) we:
  if (prev->domain->perf) {
    save the MMCRs and counters
  } else if (prev->regs->msr & MSR_PMM || MMCR0[FC] == 0) {
    warning("bad domain!")
  }

  MMCR0[FC]=1 // turn off all counters in case a domain is being bad
  ....
  if (next->domain->perf) {
    restore the MMCRs and counters
  }

of course we can be a little smarter with the on/of logic.
NOTE: the MSR[PMM] bit will get save and restores with the other MSRbits, so no special handling is required.
->Is it allowed to direct an interrupt directly to a linux guest inxen (If there would be too much latency between the irq and the readof the samples we may block a lot of occurring perf interrupts beforewe are resetting MMCR0(PMAE))?Otherwise it would end up very similar to the xeonoprof approach thathandles the interrupt in xen which put the data to a shared bufferand then passing the information about "new data" to the appropriatedomain with a virtual interrupt.
Ahh, the point is, there is no way to direct performance interrupts toXen while the domain is running, all performance interrupts godirectly to the domain, so really the only thing you can/shouldvirtualize is MMCR0 thru a single hcall.
NOTE: When Xen is running MSR[HV]=1 and when a domain is runningMSR[HV]=0. So if an interrupt (like the performance interrupts) doesnot effect the value of MSR[HV] then the interrupt will not cause theprocessor to switch from domain to Xen.
There is a table in in book 3 "Interrupt Definitions" that describeshow each interrupt effects the MSR (and yes we run with LPES[1]=1)

"and yes we run with LPES[1]=1" - This is it - that's the thing missingin my considerations up to now, I did neither know nor expect thatxen-ppc is implemented that way. I just read quickly over the logicalpartitioning section and there was a big "I see" ;) I'll read thatsection and referenced ones again more in depth and may mail/chat/.. youdirectly to clarify the rest.

->Is there a single or at least limited number of transition pointsbetween domains (e.g. in the scheduler) and xen where we could placea hook to change the MSR(PMM) as we need it or will this statetransition need to be spread all over the code?
yes, context_switch() should be the only place.
I also must add that this would not solve the described issue thatanother domain could write to the performance monitor registers andinterfere a profiling session.
Since you are performing save and restore of these registers a domaincan only hurt its self.You do bring up an excellent point in that the domain can mess withMMCR0, so in that case we would have to make sure that MMCR0[FCH]=1whenever we enter the HV in exceptions.S (See the EXCEPTION_HEADmacro, and beware, if you make it to big you will get assembler errorsand we'll have to cook up some magic).
If we decide that this issue is negligible for the moment we couldalso continue the xenoprof oriented approach which would provide uswith the profiling of multiple domains and xen itself.
I believe that the work of profiling active domains is easy, maybe 2-3weeks worth of work since the domains collect their own information,the only LinuxPPC work would be to add any Xen specific events.Depending on your expertise you can contact other members to help youwith the assembly in exceptions.S (that file is a PITA).

Sounds good, in view of the (for me) new information that theperformance interrupt can be handled in linux I now also think this is away to get active domain sampling running.

I can't really estimate if it would be a worthwhile task to implementthis Domain only profiling and then switching/extending it, do youthink it is that much easier to implement this in the first place ?
Yes, because we know it works. Don't forget, LinuxPPC already runs ona hypervisor, most of this stuff has been done so we get a lot ofstuff for free (including lots of bad stuff, but this is good stuff).
My hope is that this experience would allow you to then profilepassive domains, which will be disappointing because the profilingwill be limited to events that cause you to enter Xen and willprobably have nothing to do with the counters. This step might notmake sense at all but we should explore it.
Then the next step would be to profile Xen, which can also be stagedinto chewable pieces which should probably take about 3 weeks to getsome profiling and then iteratively get closer and closer to coveringmore and more events.
I thought we should think/decide about the privilege issue Idescribed. Especially in view of the fact that this applies to bothapproaches.
I think we nailed and described the privilege issues, do you have morein mind?
-JX

It is a good point to break the item in chunks that are easier tohandle, no other issues in mind currently - I just revoke some wrongassumptions about the interrupt handling and re-read them from the docs *g*

Thanks for clarification - Christian

you can see the hcall being setup here:
arch/powerpc/platforms/pseries/setup.c pSeries_setup_arch 322ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
Here is the spec:
Lets try and go deep with this and then think about how to oprofileXen itself.
-JX


On Mar 13, 2007, at 1:11 PM, Christian Ehrhardt wrote:
Hi Folks,
I analyzed the oprofile/xenoprof code and tried to do a simpleminded powerpc mapping in the last two weeks. As it come up in aphone call on Monday I overlooked some possible issues arising outof the simple mapping of xenoprof to the power architecture. Inthis mail I briefly describe some backgrounds as well as myconsiderations so far.I'm not sure If I got all power and x86 specifics in the right wayso feel free to correct me - I'm open to any comments and ideas - Ihope together we reach a realizable plan if and how this could beimplemented.
-- Background I - oprofile basic principles --
Oprofile is a common profiling tool used in the linux world. Itconsists of two layers. First the kernel space driver that containsa generic infrastructure and management part as well as aarchitecture dependent part that handles the hardware specifictasks. The second part is the userspace component that controls thekernel part and computes the output to different reports.
-- Background II - xenoprof approach --
To use oprofile (http://oprofile.sourceforge.net/about/) in the xenenvironment it was extended to xenoprof(http://xenoprof.sourceforge.net/) which adds a third layer in thexen hypervisor. The linux kernel space driver supports now a new“architecture” that repesents xen. This implementation uses ahypercall instead of hardware specific code. The data that isusually reported by interrupts is now reported to xen by thehardware. Xen distinguish some parameters and reports the datachunk to the profiling domain via the virtual interrupt eventnotification provided by xen. This gets more complex with multipledomains etc. For more read the docs on xenoprof web page.The hardware specific code that once was in the oprofile kerneldrivers is now located (adapted to the new environment) in the xensource where the new hypercalls are mapped to the real hardware.
-- Mapping xenoprof to Power - simple approach --
This approach tries to use as much of the initial xenoprofarchitecture by trying to map the power implementation to thetechnically x86 oriented xenoprof architecture. This would ease theimplementation but spawn some risks I try to list here (The list isnot complete, there may be more not yet realized issues).The basic principle of those profiling implementations is aperformance counter (real time, cycles, special events, ... ) thattriggers an interrupt. This interrupt then tries to saveinformation about the current point of execution in its interrupthandler. The oprofile implementation for power works in a similarscheme so I thought this should be the easiest way.
-- Possible issues and their background --
Please take a look at this graphic before/while reading thefollowing details(https://ltc.linux.ibm.com/wiki/XenPPC/profilingdiscussion) – itmight also be useful to have a PowerISA doc to read about specialregisters and bits(http://www.power.org/news/articles/new_brand/#isa).The setting of the used hardware elements in the x86 implementationneeds ring0 afaik and the Dom kernel runs in ring1, because of thatit can't interfere the nmi programming done by xen in ring0. In thepower architecture there are three privilege levels and the linuxkernel usually runs in the second level. Afaik the Dom linux kerneldoes also run in this level in the xen-ppc implementation, becauseof that we could set performance monitor registers up in the rightway in xen but could not really be sure that a Dom kernel does notchange the related registers without “asking” the hypervisor.
-> is there a way still unknown to me to protect those registers?


-- Other possible approaches --
After consulting the current Power ISA documents again I found somepoints that may allow other implementations of profiling in xen.a) Because the Dom Kernel seem to be able to setup the performanceprofiling without invoking the hypervisor it could be possible tolet a domain just do the profiling on their own. But there areother issues in this way too e.g. In which way would samples ofother domains occur and would this be a security breach?
b) The Power architecture provides a very potent performancemonitor with features that allow the freezing of the counters e.g.Freeze them while the execution is in hypervisor mode MSR_HVPR=0b10. But such features would only help to distinguish verticallyin the graphics referenced above. Only the hypervisor is in aposition to differ horizontally between different domains.
I'm planning to move the illustration I used to the public wikiafter the first round of review and keep the planned design up todate there.
More but not yet mature thought&ideas about that in mind,
Christian
--Grüsse / regards,
Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
--
Grüsse / regards, Christian Ehrhardt

IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung:Herbert Kircher Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

--

Grüsse / regards,Christian Ehrhardt


IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx

IBM Deutschland Entwicklung GmbH

Vorsitzender des Aufsichtsrats: Johann WeihenGeschäftsführung: Herbert KircherSitz der Gesellschaft: Böblingen

Registergericht: Amtsgericht Stuttgart, HRB 243294


_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel

WARNING - OLD ARCHIVES

xen-ppc-devel

Re: [XenPPC] Profiling in xen – ppc considerations