[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Tracing and profiling blog post

On 30 October 2014 14:20, Richard Mortier
<Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
> On 30 Oct 2014, at 09:36, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>> Here's what I have so far:
>> https://github.com/talex5/mirage-profile/blob/new-api/lib/trace_stubs.mli
>> https://github.com/talex5/mirage-profile/blob/new-api/lib/counter.mli
>> There's not much here, but it would be good to keep this API stable as
>> pretty much all mirage libraries will be using it.
> some quick thoughts:
> trace_stubs.mli:
> would it make sense for note_{suspend,resume} to be string -> unit (or some 
> more opaque type than string even, though perhaps of fixed size) so that the 
> programmer can indicate reasons for the suspend/resume?

This name is perhaps confusing, but it's for block_domain/poll/select.
On Xen, mirage-platform's main.ml is that only thing that calls it.
The reason for suspending is always that there isn't any work to do
(exactly what we're waiting for is indicated by the sleeping event
channel threads at that moment).

If we had a more general version, it could perhaps be used for GC
pauses too, but there's a separate entry point for that using
Callback, because it's called from C code. Actual suspend-to-disk
could be another reason.

Are there any more types?

> can labels on threads be changed over their lifetime?  can labels overlap or 
> are they unique?  if unique, within what context?

Originally there was one label per thread, but now they're essentially
just log messages that get attached to the active thread. They can be
used to label a thread, but also to note interesting events, so
perhaps a different name would be useful here (Trace.log?
Trace.note?). There should probably be a printf version too.

Actual labelling more often happens with named_wait, named_task, etc now.

> trace_enabled.mli:
> how do i interact with the buffer other than to snapshot it?

What kind of interations did you have in mind?

> ...and what's counter for?  (ie., how general/widely used is it intended to 
> be?)

In the examples, I used counters for:

- Number of active grant refs
- Number of block reads completed
- Bytes written to console
- IP packets sent
- TCP bytes submitted
- TCP bytes ack'd

Measuring stuff can get complicated quickly. The last monitoring
system I worked on had many different types of "metric" (instantaneous
measurements, cumulative usage, on-going rates of increase, etc). You
could efficiently query for e.g. average response latecy between any
two points in time, allowing for real-time display of "average latency
over the last 5 min" or "number of requests since midnight", etc.

The counters were also arranged in a hierarchy. For example, you could
have a segments-acked counter for each TCP stream, which would then
also get aggregated as totals for that VM, and then further aggregated
both per-customer (across multiple VMs), and per resource pool. You
could see graphs of aggregated data and then drill down to see what
had contributed to it.

Some of the metrics were shared with customers[*], who treated them as
extra monitoring data for their own (outsourced) resource pools.

I don't know whether we want to go down that route just yet, though.
It took a while to explain everything ;-)

>> The API for controlling the tracing, dumping out events, etc is much
>> less critical and can be changed later, as it only matters to the
>> developer profiling their unikernel.
> agree to some extent -- though if some components wish to control tracing in 
> other components as a result of observation of their own behaviour, the 
> control API may become  more pervasively used than the dumping/display api i 
> guess.

Perhaps. I suspect we'd have the libraries just produce events and
have the logic for responding to them in the unikernel config, rather
than having libraries reconfiguring the profiling directly. That
sounds confusing!

[*] This was a research project, so not real customers.

Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.