Xen project Mailing List

Re: [MirageOS-devel] Tracing and profiling blog post

To: Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx>

From: Thomas Leonard <talex5@xxxxxxxxx>

Date: Thu, 30 Oct 2014 16:29:43 +0000

Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>

Delivery-date: Thu, 30 Oct 2014 16:29:50 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 30 October 2014 14:20, Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote: > > On 30 Oct 2014, at 09:36, Thomas Leonard <talex5@xxxxxxxxx> wrote: > >> Here's what I have so far: >> >> https://github.com/talex5/mirage-profile/blob/new-api/lib/trace_stubs.mli >> https://github.com/talex5/mirage-profile/blob/new-api/lib/counter.mli >> >> There's not much here, but it would be good to keep this API stable as >> pretty much all mirage libraries will be using it. > > some quick thoughts: > > trace_stubs.mli: > > would it make sense for note_{suspend,resume} to be string -> unit (or some > more opaque type than string even, though perhaps of fixed size) so that the > programmer can indicate reasons for the suspend/resume? This name is perhaps confusing, but it's for block_domain/poll/select. On Xen, mirage-platform's main.ml is that only thing that calls it. The reason for suspending is always that there isn't any work to do (exactly what we're waiting for is indicated by the sleeping event channel threads at that moment). If we had a more general version, it could perhaps be used for GC pauses too, but there's a separate entry point for that using Callback, because it's called from C code. Actual suspend-to-disk could be another reason. Are there any more types? > can labels on threads be changed over their lifetime? can labels overlap or > are they unique? if unique, within what context? Originally there was one label per thread, but now they're essentially just log messages that get attached to the active thread. They can be used to label a thread, but also to note interesting events, so perhaps a different name would be useful here (Trace.log? Trace.note?). There should probably be a printf version too. Actual labelling more often happens with named_wait, named_task, etc now. > trace_enabled.mli: > > how do i interact with the buffer other than to snapshot it? What kind of interations did you have in mind? > ...and what's counter for? (ie., how general/widely used is it intended to > be?) In the examples, I used counters for: - Number of active grant refs - Number of block reads completed - Bytes written to console - IP packets sent - TCP bytes submitted - TCP bytes ack'd Measuring stuff can get complicated quickly. The last monitoring system I worked on had many different types of "metric" (instantaneous measurements, cumulative usage, on-going rates of increase, etc). You could efficiently query for e.g. average response latecy between any two points in time, allowing for real-time display of "average latency over the last 5 min" or "number of requests since midnight", etc. The counters were also arranged in a hierarchy. For example, you could have a segments-acked counter for each TCP stream, which would then also get aggregated as totals for that VM, and then further aggregated both per-customer (across multiple VMs), and per resource pool. You could see graphs of aggregated data and then drill down to see what had contributed to it. Some of the metrics were shared with customers[*], who treated them as extra monitoring data for their own (outsourced) resource pools. I don't know whether we want to go down that route just yet, though. It took a while to explain everything ;-) >> The API for controlling the tracing, dumping out events, etc is much >> less critical and can be changed later, as it only matters to the >> developer profiling their unikernel. > > agree to some extent -- though if some components wish to control tracing in > other components as a result of observation of their own behaviour, the > control API may become more pervasively used than the dumping/display api i > guess. Perhaps. I suspect we'd have the libraries just produce events and have the logic for responding to them in the unikernel config, rather than having libraries reconfiguring the profiling directly. That sounds confusing! [*] This was a research project, so not real customers. -- Dr Thomas Leonard http://0install.net/ GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.