|   xen-ppc-devel
Re: [XenPPC] Profiling in xen – ppc considerations 
| 
Jimi Xenidis wrote:
 Yes, Active Domains collect their own information. In the group of the 
active domains is one that is treated as primary domain (the one that 
starts/coodrinates/stops all via the oprofile userspace tools). This 
Primary domain collects the samples for the  passive domains that are 
sampled. So we have the following things in action:
Xen : sets up/handles performance interrupts, delegate to domains via 
virtual interrupt
On Mar 14, 2007, at 6:17 AM, Christian Ehrhardt wrote:
 
Jimi Xenidis wrote:
 Short Answer: "some transport "- Shared buffers (Xen/Guests) + xen 
virtual interrupts which are received in the linux oprofile driver 
for xen
Long Answer: Every explanation would be almost identical to 
http://xenoprof.sourceforge.net/xenoprof_2.0.txt so I recommend to 
read this because section one explains the background mechanisms.
Christian, nice summary.
One question I have is does Xen allow the domain to extract domain 
oprofile information as linxu would without Xen, or does xen allow 
for some transport that eases the collection?
 
Ahh, ok.
So active domains collect their own information, and Xen collects the 
information for the passive domains?
 
Active Domains: collect their sample data passed via virtual interrupt
One of the active Domain is the primary one: master "controller" and 
collecting passive domain samples additionally
Passive Domains: Do nothing because they are not aware whats going on, 
btw paranoid people consider this as a security breach My thought was that a hypervisor could ease the burden of data 
collection by providing a communication buffer and collect the 
information "out of body", especially information about low-level 
operations, but I guess they did not do that.
"and yes we run with LPES[1]=1" - This is it - that's the thing missing 
in my considerations up to now, I did neither know nor expect that 
xen-ppc is implemented that way. I just read quickly over the logical 
partitioning section and there was a big "I see" ;) I'll read that 
section and referenced ones again more in depth and may mail/chat/.. you 
directly to clarify the rest.
 So you suggest to freeze counters while hypervisor or other domains 
are running via "Performance Monitor Mark" MSR(PMM) and "Freeze 
counters while Mark" MMCR0(FCM0/1) - combined with keeping the 
interrupt handler in the linux code ?
Anyway.. Lets stick with oprofiling the domains only.
There really very little Linux work to do here.  We need:
 1. An hcall that turn the performance monitor on for the domain
2. Save and restore the relevant registers for any domain that is 
has it turned on. 
 3. Turn it off for domains that have it disabled.
 
Again, sticking with oprofiling "active" domains only.
Yes, so MMCR0[FCH]=1 (freeze counters while MSR[HV]=1) always.
Then whenever we perform a domain switch (context_switch()) we:
  if (prev->domain->perf) {
    save the MMCRs and counters
  } else if (prev->regs->msr & MSR_PMM || MMCR0[FC] == 0) {
    warning("bad domain!")
  }
  MMCR0[FC]=1 // turn off all counters in case a domain is being bad
  ....
  if (next->domain->perf) {
    restore the MMCRs and counters
  }
of course we can be a little smarter with the on/of logic.
NOTE: the MSR[PMM] bit will get save and restores with the other MSR 
bits, so no special handling is required.
 ->Is it allowed to direct an interrupt directly to a linux guest in 
xen (If there would be too much latency between the irq and the read 
of the samples we may block a lot of occurring perf interrupts before 
we are resetting MMCR0(PMAE))?
Otherwise it would end up very similar to the xeonoprof approach that 
handles the interrupt in xen which put the data to a shared buffer 
and then passing the information about "new data" to the appropriate 
domain with a virtual interrupt.
 
Ahh, the point is, there is no way to direct performance interrupts to 
Xen while the domain is running, all performance interrupts go 
directly to the domain, so really the only thing you can/should 
virtualize is MMCR0 thru a single hcall. 
NOTE: When Xen is running MSR[HV]=1 and when a domain is running 
MSR[HV]=0.  So if an interrupt (like the performance interrupts) does 
not effect the value of MSR[HV] then the interrupt will not cause the 
processor to switch from domain to Xen. 
There is a table in in book 3 "Interrupt Definitions" that describes 
how each interrupt effects the MSR (and yes we run with LPES[1]=1) 
 
 Sounds good, in view of the (for me) new information that the 
performance interrupt can be handled in linux I now also think this is a 
way to get active domain sampling running.->Is there a single or at least limited number of transition points 
between domains (e.g. in the scheduler) and xen where we could place 
a hook to change the MSR(PMM) as we need it or will this state 
transition need to be spread all over the code?
 
yes, context_switch() should be the only place.
 
I also must add that this would not solve the described issue that 
another domain could write to the performance monitor registers and 
interfere a profiling session.
 
Since you are performing save and restore of these registers a domain 
can only hurt its self.
You do bring up an excellent point in that the domain can mess with 
MMCR0, so in that case we would have to make sure that MMCR0[FCH]=1 
whenever we enter the HV in exceptions.S (See the EXCEPTION_HEAD 
macro, and beware, if you make it to big you will get assembler errors 
and we'll have to cook up some magic). 
 If we decide that this issue is negligible for the moment we could 
also continue the xenoprof oriented approach which would provide us 
with the profiling of multiple domains and xen itself.
 
I believe that the work of profiling active domains is easy, maybe 2-3 
weeks worth of work since the domains collect their own information, 
the only LinuxPPC work would be to add any Xen specific events.  
Depending on your expertise you can contact other members to help you 
with the  assembly in exceptions.S (that file is a PITA). 
 It is a good point to break the item in chunks that are easier to 
handle, no other issues in mind currently - I just revoke some wrong 
assumptions about the interrupt handling and re-read them from the docs *g*
I can't really estimate if it would be a worthwhile task to implement 
this Domain only profiling and then switching/extending it, do you 
think it is that much easier to implement this in the first place ?
 
Yes, because we know it works.  Don't forget, LinuxPPC already runs on 
a hypervisor, most of this stuff has been done so we get a lot of 
stuff for free (including lots of bad stuff, but this is good stuff). 
My hope is that this experience would allow you to then profile 
passive domains, which will be disappointing because the profiling 
will be limited to events that cause you to enter Xen and will 
probably have nothing to do with the counters. This step might not 
make sense at all but we should explore it. 
Then the next step would be to profile Xen, which can also be staged 
into chewable pieces which should probably take about 3 weeks to get 
some profiling and then iteratively get closer and closer to covering 
more and more events. 
 I thought we should think/decide about the privilege issue I 
described. Especially in view of the fact that this applies to both 
approaches.
 
I think we nailed and described the privilege issues, do you have more 
in mind? 
-JX
 
Thanks for clarification - Christian
 
 
you can see the hcall being setup here:
arch/powerpc/platforms/pseries/setup.c pSeries_setup_arch 322 
ppc_md.enable_pmcs = pseries_lpar_enable_pmcs; 
Here is the spec:
Lets try and go deep with this and then think about how to oprofile 
Xen itself. 
-JX
On Mar 13, 2007, at 1:11 PM, Christian Ehrhardt wrote:
 
Hi Folks,
I analyzed the oprofile/xenoprof code and tried to do a simple 
minded powerpc mapping in the last two weeks. As it come up in a 
phone call on Monday I overlooked some possible issues arising out 
of the simple mapping of xenoprof to the power architecture. In 
this mail I briefly describe some backgrounds as well as my 
considerations so far.
I'm not sure If I got all power and x86 specifics in the right way 
so feel free to correct me - I'm open to any comments and ideas - I 
hope together we reach a realizable plan if and how this could be 
implemented. 
-- Background I - oprofile basic principles --
Oprofile is a common profiling tool used in the linux world. It 
consists of two layers. First the kernel space driver that contains 
a generic infrastructure and management part as well as a 
architecture dependent part that handles the hardware specific 
tasks. The second part is the userspace component that controls the 
kernel part and computes the output to different reports. 
-- Background II - xenoprof approach --
To use oprofile (http://oprofile.sourceforge.net/about/) in the xen 
environment it was extended to xenoprof 
(http://xenoprof.sourceforge.net/) which adds a third layer in the 
xen hypervisor. The linux kernel space driver supports now a new 
“architecture” that repesents xen. This implementation uses a 
hypercall instead of hardware specific code. The data that is 
usually reported by interrupts is now reported to xen by the 
hardware. Xen distinguish some parameters and reports the data 
chunk to the profiling domain via the virtual interrupt event 
notification provided by xen. This gets more complex with multiple 
domains etc. For more read the docs on xenoprof web page.
The hardware specific code that once was in the oprofile kernel 
drivers is now located (adapted to the new environment) in the xen 
source where the new hypercalls are mapped to the real hardware. 
-- Mapping xenoprof to Power - simple approach --
This approach tries to use as much of the initial xenoprof 
architecture by trying to map the power implementation to the 
technically x86 oriented xenoprof architecture. This would ease the 
implementation but spawn some risks I try to list here (The list is 
not complete, there may be more not yet realized issues).
The basic principle of those profiling implementations is a 
performance counter (real time, cycles, special events, ... ) that 
triggers an interrupt. This interrupt then tries to save 
information about the current point of execution in its interrupt 
handler. The oprofile implementation for power works in a similar 
scheme so I thought this should be the easiest way. 
-- Possible issues and their background --
Please take a look at this graphic before/while reading the 
following details 
(https://ltc.linux.ibm.com/wiki/XenPPC/profilingdiscussion) – it 
might also be useful to have a PowerISA doc to read about special 
registers and bits 
(http://www.power.org/news/articles/new_brand/#isa).
The setting of the used hardware elements in the x86 implementation 
needs ring0 afaik and the Dom kernel runs in ring1, because of that 
it can't interfere the nmi programming done by xen in ring0. In the 
power architecture there are three privilege levels and the linux 
kernel usually runs in the second level. Afaik the Dom linux kernel 
does also run in this level in the xen-ppc implementation, because 
of that we could set performance monitor registers up in the right 
way in xen but could not really be sure that a Dom kernel does not 
change the related registers without “asking” the hypervisor. 
-> is there a way still unknown to me to protect those registers?
-- Other possible approaches --
After consulting the current Power ISA documents again I found some 
points that may allow other implementations of profiling in xen.
a) Because the Dom Kernel seem to be able to setup the performance 
profiling without invoking the hypervisor it could be possible to 
let a domain just do the profiling on their own. But there are 
other issues in this way too e.g. In which way would samples of 
other domains occur and would this be a security breach? 
b) The Power architecture provides a very potent performance 
monitor with features that allow the freezing of the counters e.g. 
Freeze them while the execution is in hypervisor mode MSR_HVPR 
=0b10. But such features would only help to distinguish vertically 
in the graphics referenced above. Only the hypervisor is in a 
position to differ horizontally between different domains. 
I'm planning to move the illustration I used to the public wiki 
after the first round of review and keep the planned design up to 
date there. 
More but not yet mature thought&ideas about that in mind,
Christian
--Grüsse / regards,
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
 
--
Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung: 
Herbert Kircher Sitz der Gesellschaft: Böblingen 
Registergericht: Amtsgericht Stuttgart, HRB 243294
 
 
--
Grüsse / regards, 
Christian Ehrhardt 
IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Johann Weihen 
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen 
Registergericht: Amtsgericht Stuttgart, HRB 243294
_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel
 | 
 |  |