Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities

To: George Dunlap <George.Dunlap@xxxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Wed, 8 Apr 2015 13:29:38 +0000

Accept-language: en-GB, en-US

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "dongxiao.xu@xxxxxxxxx" <dongxiao.xu@xxxxxxxxx>, "JBeulich@xxxxxxxx" <JBeulich@xxxxxxxx>, "chao.p.peng@xxxxxxxxxxxxxxx" <chao.p.peng@xxxxxxxxxxxxxxx>

Delivery-date: Wed, 08 Apr 2015 13:29:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQcR1lGFV+CI3JqkKuU5yUzqQKdZ1C2iKAgAAhG4A=

Thread-topic: [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities

On Wed, 2015-04-08 at 12:27 +0100, George Dunlap wrote: > On 04/07/2015 11:27 AM, Andrew Cooper wrote: > > There seem to be several areas of confusion indicated in your document. > > I am unsure whether this is a side effect of the way you have written > > it, but here are (hopefully) some words of clarification. To the best > > of my knowledge: > > > > PSR CMT works by tagging cache lines with the currently-active RMID. > > The cache utilisation is a count of the number of lines which are tagged > > with a specific RMID. MBM on the other hand counts the number of cache > > line fills and cache line evictions tagged with a specific RMID. > > An actual counter, like MBM, we actually don't need different RMIDs* to > implement a per-vcpu counter: we could just read the value on every > context-switch and compare it to the last value and store it in the vcpu > struct. Having extra RMIDs just makes it easier -- is that right? > I'm not sure I'm following. As per Andrew's description, both are counters. And in fact, if sampling-&-subtracting at every context switch is an option, both CMT and MBM stats of a particular instance of execution of a vcpu can be collected, I think, just by using one RMID for each pCPU. I'm not sure what you 'last value' refers to, though. Last value of what? I mean, last value of the counter associated with what RMID? What entity were you thinking to associate an RMID with, a vcpu? A pCPU? A domain? Were you thinking to a static or dynamic kind of association? Anyway, sampling at every context switch means one MSR write and one MSR read (to get one sample), which, as you say yourself below, may not be that cheap. > I haven't thought about it in detail, but it seems like for that having > an LRU algorithm for allocating MBM RMIDs might work. > > * Are the called RMIDs for MBM? If not replace "RMID" in this paragraph > with the appropriate value. > They are called RMID, and they are the same for both MBM and CMT, AFAIUI. I mean, once you associated logical entity X with RMID y, you can monitor both X's cache occupancy and memory bandwidth, via RMID y. OTOH, it is not possible to associate to X RMID y for CMT and RMID z for MBM. > For CMT, we could imagine setting the RMID as giving the pcpu a > paintbrush with a specific color of paint, with which it paints that > color on the wall (which would represent the L3 cache). If we give Red > to Andy and Blue to Dario, then after a while we can look at the red and > blue portions of the wall and know which belongs to which. But if we > then give the red one to Konrad, we'll never be *really* sure how much > of the red on the wall was put there by Konrad and how much was put > there by Andy. If Dario is a mad painter just painting over everything, > then within a relatively short period of time we can assume that > whatever red there is belongs to Konrad; but if Dario is more > constrained, Andy's paint may stay there indefinitely. > > But what we *can* say, I suppose, is that Konrad's "footprint" is > certainly *less than* the amount of red paint on the wall; and that any > *increase* in the amount of red paint since we gave the brush to Konrad > certainly belongs to him. > > So we could probably "bracket" the usage by any given vcpu: if the > original RMID occupancy was O, and the current RMID occupancy is N, then > the actual occupancy is between [N-O] and N. > > Hmm, although I guess that's not true either -- a vcpu may still have > occupancy from all previous RMIDs that it's used. > This is about the problem 'recycling' RMIDs, but having not understood how you are thinking to allocate them, I'm not getting the recycling part either. :-) It seems that you're suggesting some kind of dynamic RMID to vcpu allocation scheme, is that the case? > > As far as MSRs themselves go, an extra MSR write in the context switch > > path is likely to pale into the noise. However, querying the data is an > > indirect MSR read (write to the event select MSR, read from the data > > MSR). Furthermore there is no way to atomically read all data at once > > which means that activity on other cores can interleave with > > back-to-back reads in the scheduler. > > I don't think it's a given that an MSR write will be cheap. Back when I > was doing my thesis (10 years ago now), logging some performance > counters on context switch (which was just an MSR read) added about 7% > to the overhead of a kernel build, IIRC. > > Processors have changed quite a bit in that time, and we can hope that > Intel would have tried to make writing the IDs pretty fast. But before > we enabled anything by default I think we'd want to make sure and take a > look at the overhead first. > > -George > Thanks and Regards, Dario

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.