[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Tracing and profiling blog post



Here's more information about Go's use of the Chromium trace viewer:Âhttps://docs.google.com/document/d/1FP5apqzBgr7ahCCgFO-yoVhk4YZrNIDNf9RybngBc14/pub.

That link doesn't seem like it adds a ton of information, but it may be interesting to some.

--Mark

Mark Thurman
mthurman@xxxxxxxxx

On Thu, Oct 30, 2014 at 11:26 AM, Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:

On 30 Oct 2014, at 16:29, Thomas Leonard <talex5@xxxxxxxxx> wrote:

> On 30 October 2014 14:20, Richard Mortier
> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>>
>> would it make sense for note_{suspend,resume} to be string -> unit (or some more opaque type than string even, though perhaps of fixed size) so that the programmer can indicate reasons for the suspend/resume?
>
> This name is perhaps confusing, but it's for block_domain/poll/select.
> On Xen, mirage-platform's main.ml is that only thing that calls it.
> The reason for suspending is always that there isn't any work to do
> (exactly what we're waiting for is indicated by the sleeping event
> channel threads at that moment).
>
> If we had a more general version, it could perhaps be used for GC
> pauses too, but there's a separate entry point for that using
> Callback, because it's called from C code. Actual suspend-to-disk
> could be another reason.
>
> Are there any more types?

ah-- see comment to other mail i guess. seems likely to be better to parameterise this rather than bake it into the api, doesn't it? (i may be missing something obvious about types and ocaml here :)

>> can labels on threads be changed over their lifetime? can labels overlap or are they unique? if unique, within what context?
>
> Originally there was one label per thread, but now they're essentially
> just log messages that get attached to the active thread. They can be
> used to label a thread, but also to note interesting events, so
> perhaps a different name would be useful here (Trace.log?
> Trace.note?). There should probably be a printf version too.
>
> Actual labelling more often happens with named_wait, named_task, etc now.

ah right; i guess i'm talking about an api that subsumes lwt tracing and supports more general tracing throughout many libraries.

>
>> trace_enabled.mli:
>>
>> how do i interact with the buffer other than to snapshot it?
>
> What kind of interations did you have in mind?

one thing ETW allowed which was nice was to have real-time consumers of the tracing buffers. would allow this kind of infrastructure to plugin to something that was doing more dynamic resource management for unikernels across (e.g.) a datacenter.

>> ...and what's counter for? (ie., how general/widely used is it intended to be?)
>
> In the examples, I used counters for:
>
> - Number of active grant refs
> - Number of block reads completed
> - Bytes written to console
> - IP packets sent
> - TCP bytes submitted
> - TCP bytes ack'd
>
> Measuring stuff can get complicated quickly. The last monitoring
> system I worked on had many different types of "metric" (instantaneous
> measurements, cumulative usage, on-going rates of increase, etc). You
> could efficiently query for e.g. average response latecy between any
> two points in time, allowing for real-time display of "average latency
> over the last 5 min" or "number of requests since midnight", etc.
>
> The counters were also arranged in a hierarchy. For example, you could
> have a segments-acked counter for each TCP stream, which would then
> also get aggregated as totals for that VM, and then further aggregated
> both per-customer (across multiple VMs), and per resource pool. You
> could see graphs of aggregated data and then drill down to see what
> had contributed to it.
>
> Some of the metrics were shared with customers[*], who treated them as
> extra monitoring data for their own (outsourced) resource pools.
>
> I don't know whether we want to go down that route just yet, though.
> It took a while to explain everything ;-)

:)

i guess there are two orthogonal things here.

metrics as you describe above, which to my mind sound like (e.g.) SNMP MIBs. most useful for understanding aggregate performance of a system.

event tracing as i've been implicitly assuming, which permits more detailed cuts through system performance at the cost of added complexity (per magpie).

both are useful i think, though you ought to be able to build the former on the latter (though that might be more complex than seems reasonable).

>> agree to some extent -- though if some components wish to control tracing in other components as a result of observation of their own behaviour, the control API may become more pervasively used than the dumping/display api i guess.
>
> Perhaps. I suspect we'd have the libraries just produce events and
> have the logic for responding to them in the unikernel config, rather
> than having libraries reconfiguring the profiling directly. That
> sounds confusing!

heh-- having dynamic control of tracing was something we discussed with magpie but never implemented. the idea would've been something like a datacenter operator could notice an issue, and then "turn up" the tracing to get more detailed models, to the point where they could diagnose the problems.

but as i said, we never actually did that. (though ETW does allow dynamic control over tracing levels from a command line tool.)


--
Cheers,

R.





_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.