|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] Tracing and profiling blog post
On 30 Oct 2014, at 16:29, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 30 October 2014 14:20, Richard Mortier
> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>>
>> would it make sense for note_{suspend,resume} to be string -> unit (or some
>> more opaque type than string even, though perhaps of fixed size) so that the
>> programmer can indicate reasons for the suspend/resume?
>
> This name is perhaps confusing, but it's for block_domain/poll/select.
> On Xen, mirage-platform's main.ml is that only thing that calls it.
> The reason for suspending is always that there isn't any work to do
> (exactly what we're waiting for is indicated by the sleeping event
> channel threads at that moment).
>
> If we had a more general version, it could perhaps be used for GC
> pauses too, but there's a separate entry point for that using
> Callback, because it's called from C code. Actual suspend-to-disk
> could be another reason.
>
> Are there any more types?
ah-- see comment to other mail i guess. seems likely to be better to
parameterise this rather than bake it into the api, doesn't it? (i may be
missing something obvious about types and ocaml here :)
>> can labels on threads be changed over their lifetime? can labels overlap or
>> are they unique? if unique, within what context?
>
> Originally there was one label per thread, but now they're essentially
> just log messages that get attached to the active thread. They can be
> used to label a thread, but also to note interesting events, so
> perhaps a different name would be useful here (Trace.log?
> Trace.note?). There should probably be a printf version too.
>
> Actual labelling more often happens with named_wait, named_task, etc now.
ah right; i guess i'm talking about an api that subsumes lwt tracing and
supports more general tracing throughout many libraries.
>
>> trace_enabled.mli:
>>
>> how do i interact with the buffer other than to snapshot it?
>
> What kind of interations did you have in mind?
one thing ETW allowed which was nice was to have real-time consumers of the
tracing buffers. would allow this kind of infrastructure to plugin to something
that was doing more dynamic resource management for unikernels across (e.g.) a
datacenter.
>> ...and what's counter for? (ie., how general/widely used is it intended to
>> be?)
>
> In the examples, I used counters for:
>
> - Number of active grant refs
> - Number of block reads completed
> - Bytes written to console
> - IP packets sent
> - TCP bytes submitted
> - TCP bytes ack'd
>
> Measuring stuff can get complicated quickly. The last monitoring
> system I worked on had many different types of "metric" (instantaneous
> measurements, cumulative usage, on-going rates of increase, etc). You
> could efficiently query for e.g. average response latecy between any
> two points in time, allowing for real-time display of "average latency
> over the last 5 min" or "number of requests since midnight", etc.
>
> The counters were also arranged in a hierarchy. For example, you could
> have a segments-acked counter for each TCP stream, which would then
> also get aggregated as totals for that VM, and then further aggregated
> both per-customer (across multiple VMs), and per resource pool. You
> could see graphs of aggregated data and then drill down to see what
> had contributed to it.
>
> Some of the metrics were shared with customers[*], who treated them as
> extra monitoring data for their own (outsourced) resource pools.
>
> I don't know whether we want to go down that route just yet, though.
> It took a while to explain everything ;-)
:)
i guess there are two orthogonal things here.
metrics as you describe above, which to my mind sound like (e.g.) SNMP MIBs.
most useful for understanding aggregate performance of a system.
event tracing as i've been implicitly assuming, which permits more detailed
cuts through system performance at the cost of added complexity (per magpie).
both are useful i think, though you ought to be able to build the former on the
latter (though that might be more complex than seems reasonable).
>> agree to some extent -- though if some components wish to control tracing in
>> other components as a result of observation of their own behaviour, the
>> control API may become more pervasively used than the dumping/display api i
>> guess.
>
> Perhaps. I suspect we'd have the libraries just produce events and
> have the logic for responding to them in the unikernel config, rather
> than having libraries reconfiguring the profiling directly. That
> sounds confusing!
heh-- having dynamic control of tracing was something we discussed with magpie
but never implemented. the idea would've been something like a datacenter
operator could notice an issue, and then "turn up" the tracing to get more
detailed models, to the point where they could diagnose the problems.
but as i said, we never actually did that. (though ETW does allow dynamic
control over tracing levels from a command line tool.)
--
Cheers,
R.
Attachment:
signature.asc _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |