[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Tracing and profiling blog post

On 30 Oct 2014, at 16:29, Thomas Leonard <talex5@xxxxxxxxx> wrote:

> On 30 October 2014 14:20, Richard Mortier
> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>> would it make sense for note_{suspend,resume} to be string -> unit (or some 
>> more opaque type than string even, though perhaps of fixed size) so that the 
>> programmer can indicate reasons for the suspend/resume?
> This name is perhaps confusing, but it's for block_domain/poll/select.
> On Xen, mirage-platform's main.ml is that only thing that calls it.
> The reason for suspending is always that there isn't any work to do
> (exactly what we're waiting for is indicated by the sleeping event
> channel threads at that moment).
> If we had a more general version, it could perhaps be used for GC
> pauses too, but there's a separate entry point for that using
> Callback, because it's called from C code. Actual suspend-to-disk
> could be another reason.
> Are there any more types?

ah-- see comment to other mail i guess. seems likely to be better to 
parameterise this rather than bake it into the api, doesn't it?  (i may be 
missing something obvious about types and ocaml here :)

>> can labels on threads be changed over their lifetime?  can labels overlap or 
>> are they unique?  if unique, within what context?
> Originally there was one label per thread, but now they're essentially
> just log messages that get attached to the active thread. They can be
> used to label a thread, but also to note interesting events, so
> perhaps a different name would be useful here (Trace.log?
> Trace.note?). There should probably be a printf version too.
> Actual labelling more often happens with named_wait, named_task, etc now.

ah right; i guess i'm talking about an api that subsumes lwt tracing and 
supports more general tracing throughout many libraries.

>> trace_enabled.mli:
>> how do i interact with the buffer other than to snapshot it?
> What kind of interations did you have in mind?

one thing ETW allowed which was nice was to have real-time consumers of the 
tracing buffers. would allow this kind of infrastructure to plugin to something 
that was doing more dynamic resource management for unikernels across (e.g.) a 

>> ...and what's counter for?  (ie., how general/widely used is it intended to 
>> be?)
> In the examples, I used counters for:
> - Number of active grant refs
> - Number of block reads completed
> - Bytes written to console
> - IP packets sent
> - TCP bytes submitted
> - TCP bytes ack'd
> Measuring stuff can get complicated quickly. The last monitoring
> system I worked on had many different types of "metric" (instantaneous
> measurements, cumulative usage, on-going rates of increase, etc). You
> could efficiently query for e.g. average response latecy between any
> two points in time, allowing for real-time display of "average latency
> over the last 5 min" or "number of requests since midnight", etc.
> The counters were also arranged in a hierarchy. For example, you could
> have a segments-acked counter for each TCP stream, which would then
> also get aggregated as totals for that VM, and then further aggregated
> both per-customer (across multiple VMs), and per resource pool. You
> could see graphs of aggregated data and then drill down to see what
> had contributed to it.
> Some of the metrics were shared with customers[*], who treated them as
> extra monitoring data for their own (outsourced) resource pools.
> I don't know whether we want to go down that route just yet, though.
> It took a while to explain everything ;-)


i guess there are two orthogonal things here.

metrics as you describe above, which to my mind sound like (e.g.) SNMP MIBs. 
most useful for understanding aggregate performance of a system.

event tracing as i've been implicitly assuming, which permits more detailed 
cuts through system performance at the cost of added complexity (per magpie).

both are useful i think, though you ought to be able to build the former on the 
latter (though that might be more complex than seems reasonable).

>> agree to some extent -- though if some components wish to control tracing in 
>> other components as a result of observation of their own behaviour, the 
>> control API may become  more pervasively used than the dumping/display api i 
>> guess.
> Perhaps. I suspect we'd have the libraries just produce events and
> have the logic for responding to them in the unikernel config, rather
> than having libraries reconfiguring the profiling directly. That
> sounds confusing!

heh-- having dynamic control of tracing was something we discussed with magpie 
but never implemented. the idea would've been something like a datacenter 
operator could notice an issue, and then "turn up" the tracing to get more 
detailed models, to the point where they could diagnose the problems.

but as i said, we never actually did that. (though ETW does allow dynamic 
control over tracing levels from a command line tool.)



Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.