[XenPPC] Profiling support in xen-ppc - step2 - informations/pla

This is the initial mail as information/start of discussion about step2of the xen-ppc profiling support. The targets are:

1. Sampling xen by passing xen samples to a profiling domain

2. Passing results of postponed perfmon interrupts to the appropriatedomain (currently this samples are ignored)

 a) by using a mechanism compatible to that we need for 1. anyway

b) or by mangling the vcpu structure to emulate the occured irq andlet the domain handle this3. "context switch" pmc status in all transitions between domains andxen while sampling xen

== Background I - PMC counters need to be reset after the perfmoninterrupt occured ==First a very compact description how the performance monitoring is setup and what has to happen in/after a performance monitor interrupt - forthe real details read the appropriate cpu user manual.An operating system may set up MMCR0,MMCR1,MMCRA in a way that forexample PMC2 counts cycles and registers a handler for the perfmon irq.A PMC contains a signed 32 bit value (bit 0 is the CTR_NEG bit), whenthe PMC wraps to a negative value the current instruction/data adress iswritten to the SIAR/SDAR spr's - this is a performance monitorexception. For example if you want trace every 0x10000000 cycles you setthe PMC to 0x70000000 (0x80000000+0x1 is the first negative value).If now the condition occurs that a performance monitor exception exists(a PMC is negative) AND external interrupts are enabled (MSR[EE]=1) ANDperformance monitor exceptions are enabled MMCR0[PMXE]=1 -> theperformance monitor interrupt occurs and the handler can read SIAR/SDAR.The interrupt itself disables subsequent permon irq's by settinMMCR0[PMXE] to zero, the irq handler has to do the rest. It has to resetthe PMC to a non negative value, in our scenario we wanted to sampleevery 0x10000000 cycles so the handler has to set PMC2 back to0x70000000. It has also to re-enable perfmon iinterrupts by settingMMCR0[PMXE] back to 1.Oprofile resets the PMC values in the kernel perfmon interrupt handlerwhich knows the values (sysfs) and use them to reset the PMC properly.


== Issue I - postponed perfmon irq's ==

Sometimes it may happen that perfmon interrupts belonging to a domainoccur in xen space. Currently the performance monitor is always set upwith MMCR0[FCH] so hypervisor privlege level can't be the originalsource. Also the values MMCRA[SAMPHV] & MMCRA[SAMPPR] show that thesample was taken in domain space. It may happen sometimes that thissample is now reported "postponed" into xen because we do not runcompletely with MSR[EE]=0. The current perfmon handler in xen does justignore those samples - they are few enough that this is currently anegligible issue (small loss of accuracy).The handler needs to be there although we do not (yet) sample xen spaceto re-enable MMCR0[PMXE] and to reset PMC values (Otherwise theperformance monitor would stop to work after the first of this postponedirq's, because without MMCR0[PMXE]=1 no further perfmon irq would happen).As described above linux knows the values to which it should reset thewrapped PMC's in its handler, but xen does not. Combined with the issueof the postponed perfmon irq that belong to a domain but occur in xen wehave the situation of a perfmon irq handler in xen that does not knowhow to reset wrapped PMC's values properly.The current implementation of the xen perfmon handler resets the PMCvalues to the defaults of oprofile, but to let profiling work properly

a) xen need to be aware of the values a domain would reset the PMCs to

b) or pass the sample to the domain so that it can consume it (and alsodo the reset/reenable part for the domain)


== Background II - why PHYP might not need to know about PMC reset values ==

This is my current assumption about that after a lot of chat discussionsand document reviews - I welcome every comment making this more clear.As XenPPC developer you can look at PHYP as black box and know that"whatever" has to work non-paravirtualized because it works that way inPHYP. This is the case for the PMC handling described above - so whydoes it work without passing wanted PMC reset values to the hypervisorexplicitly.The basic assumption is that PHYP runs completely with MSR[EE]=0, inthis case our kind of postponed perfmon interrupts do just not occur.Starting from the point where MSR[EE] is set back to one in the domainthe perfmon exception will get reported as interrupt - in the domain.Because in this scenario the domain gets every perfmon interrupt it canhandle samples and reset the PMC counters properly without any issue.Even if PHYP would have the issue of postponed perfmon irq's, because itmay be not running fully with MSR[EE]=0 they might workaround this byaltering whatever they have as equivalent to our vcpu struct. They could"emulate" the interrupt by altering all registers as the irq would doit. The SIAR/SDAR is valid until the next performance monitor exceptionoccurs, so after returning to the domain it would continue with thehandler read SIAR/SDAR and reset PMC ... properly.


== Issue II - MMCR0[PMAO] polling needed to sample xen ? ==

As written above we are neither always nor never running with MSR[EE]=1in xen-ppc. While MSR[EE]=0 all the time would be nice to defer the irqback to the domain, it is also an problem to have MSR[EE]=0 for theintention of sampling the hypervisor itself.We can't assume that sampling xen works with the interrupt basedmechanism because MSR[EE] is 0 "too often/incalculable". So we needadditionally a polling based mechanism to be at least a little accurate(can only be as good as the frequency of the poll actions). Pollingwould need one or more good places in xen to check via MMCR0[PMAO] if aperfmon exception occured (If we have enough & frequent places thatactive MSR[EE] this would do the job too).

== Issue III - emulating perfmon irq in the domain by altering vcpuprohibited ? ==The intention to profile xen prohibits us to implement the handling ofthe postponed irq's with the "emulate the perfmon irq" workaroundmentioned above because new perfmon exceptions belonging to xen wouldoverwrite SIAR/SDAR before the domain can read their results.To continue the plan to sample xen we will need a event channel toreport xen samples to a profiling domain (similar to xenoprof approach)and an event channel (maybe the same) to report samples of postponedperfmon irq's. This way linux which knows the PMC reset values can resetthem which saves us from implementing the PMC reset value awareness forthe postponed irq's. But xen will need to get a (kindof/mechanism/trick) "pmu setup interface" which defines PMC reset valuesand all the other pmu related registers for the part of profiling xenitself.


== Background III - how to switch PMC context on xen<->domain transitions ==

I currently think that the domain should still sample with MMCR0[FCH]=1so in the transition into the hypervisor we have frozen counters untilthe PMU_SAVE_STATE in exceptions.S has saved the domain perfmonsetting&PMCs and restored the ones of xen. At last it sets MMCR0[FCH]=0so the profiling continues with the xen configuration. On the way backto a domain it first sets MMCR0[FCH]=1 and then saves xen / restoresdomain perfmon status. This would profile all xen but the small slicebetween PMU_SAVE_STATE and the involved domain.

On thing for sure, these issues makes the implementation to sample xenmore complex than I thought initially :-(The complexity of this may render the text capable of beeingmisunderstood - if anything is just confusing for someone, please ask meto me to improve my description ;-)

--

Grüsse / regards,Christian Ehrhardt


IBM Linux Technology Center, Open Virtualization
+49 7031/16-3385
Ehrhardt@xxxxxxxxxxxxxxxxxxx
Ehrhardt@xxxxxxxxxx

IBM Deutschland Entwicklung GmbH

Vorsitzender des Aufsichtsrats: Johann WeihenGeschäftsführung: Herbert KircherSitz der Gesellschaft: Böblingen

Registergericht: Amtsgericht Stuttgart, HRB 243294


_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel

WARNING - OLD ARCHIVES

xen-ppc-devel

[XenPPC] Profiling support in xen-ppc - step2 - informations/plan/questi