Xen project Mailing List

Re: [MirageOS-devel] Tracing and profiling blog post

To: Thomas Leonard <talex5@xxxxxxxxx>

From: Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx>

Date: Thu, 30 Oct 2014 18:26:41 +0000

Accept-language: en-US, en-GB

Acceptlanguage: en-US, en-GB

Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>

Delivery-date: Thu, 30 Oct 2014 18:29:34 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

Thread-index: Ac/0bw0Iz+wM/bUGT0Cn8P+Hi9IKlA==

Thread-topic: [MirageOS-devel] Tracing and profiling blog post

On 30 Oct 2014, at 16:29, Thomas Leonard <talex5@xxxxxxxxx> wrote: > On 30 October 2014 14:20, Richard Mortier > <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote: >> >> would it make sense for note_{suspend,resume} to be string -> unit (or some >> more opaque type than string even, though perhaps of fixed size) so that the >> programmer can indicate reasons for the suspend/resume? > > This name is perhaps confusing, but it's for block_domain/poll/select. > On Xen, mirage-platform's main.ml is that only thing that calls it. > The reason for suspending is always that there isn't any work to do > (exactly what we're waiting for is indicated by the sleeping event > channel threads at that moment). > > If we had a more general version, it could perhaps be used for GC > pauses too, but there's a separate entry point for that using > Callback, because it's called from C code. Actual suspend-to-disk > could be another reason. > > Are there any more types? ah-- see comment to other mail i guess. seems likely to be better to parameterise this rather than bake it into the api, doesn't it? (i may be missing something obvious about types and ocaml here :) >> can labels on threads be changed over their lifetime? can labels overlap or >> are they unique? if unique, within what context? > > Originally there was one label per thread, but now they're essentially > just log messages that get attached to the active thread. They can be > used to label a thread, but also to note interesting events, so > perhaps a different name would be useful here (Trace.log? > Trace.note?). There should probably be a printf version too. > > Actual labelling more often happens with named_wait, named_task, etc now. ah right; i guess i'm talking about an api that subsumes lwt tracing and supports more general tracing throughout many libraries. > >> trace_enabled.mli: >> >> how do i interact with the buffer other than to snapshot it? > > What kind of interations did you have in mind? one thing ETW allowed which was nice was to have real-time consumers of the tracing buffers. would allow this kind of infrastructure to plugin to something that was doing more dynamic resource management for unikernels across (e.g.) a datacenter. >> ...and what's counter for? (ie., how general/widely used is it intended to >> be?) > > In the examples, I used counters for: > > - Number of active grant refs > - Number of block reads completed > - Bytes written to console > - IP packets sent > - TCP bytes submitted > - TCP bytes ack'd > > Measuring stuff can get complicated quickly. The last monitoring > system I worked on had many different types of "metric" (instantaneous > measurements, cumulative usage, on-going rates of increase, etc). You > could efficiently query for e.g. average response latecy between any > two points in time, allowing for real-time display of "average latency > over the last 5 min" or "number of requests since midnight", etc. > > The counters were also arranged in a hierarchy. For example, you could > have a segments-acked counter for each TCP stream, which would then > also get aggregated as totals for that VM, and then further aggregated > both per-customer (across multiple VMs), and per resource pool. You > could see graphs of aggregated data and then drill down to see what > had contributed to it. > > Some of the metrics were shared with customers[*], who treated them as > extra monitoring data for their own (outsourced) resource pools. > > I don't know whether we want to go down that route just yet, though. > It took a while to explain everything ;-) :) i guess there are two orthogonal things here. metrics as you describe above, which to my mind sound like (e.g.) SNMP MIBs. most useful for understanding aggregate performance of a system. event tracing as i've been implicitly assuming, which permits more detailed cuts through system performance at the cost of added complexity (per magpie). both are useful i think, though you ought to be able to build the former on the latter (though that might be more complex than seems reasonable). >> agree to some extent -- though if some components wish to control tracing in >> other components as a result of observation of their own behaviour, the >> control API may become more pervasively used than the dumping/display api i >> guess. > > Perhaps. I suspect we'd have the libraries just produce events and > have the logic for responding to them in the unikernel config, rather > than having libraries reconfiguring the profiling directly. That > sounds confusing! heh-- having dynamic control of tracing was something we discussed with magpie but never implemented. the idea would've been something like a datacenter operator could notice an issue, and then "turn up" the tracing to get more detailed models, to the point where they could diagnose the problems. but as i said, we never actually did that. (though ETW does allow dynamic control over tracing levels from a command line tool.) -- Cheers, R.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.