Xen project Mailing List

xen summit 2023 design session: distributed tracing

To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Edwin Torok <edwin.torok@xxxxxxxxx>

Date: Sun, 25 Jun 2023 15:25:34 +0200

Cc: Marcus Granado <marcus.granado@xxxxxxxxx>

Delivery-date: Sun, 25 Jun 2023 13:26:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

design session: distributed tracing Marcusg: proof of concept * where to start, simplest parts * how to attempt to collect the context (unique/random id),begin/end events * e.g. domain create George: pass in id to hypervisor Andy: maybe userspace first, libraries/xenguest more difficult in calls like add_to_physmap continuable hypercall indistinguishable from userspace calling it N times e.g. new event becomes pending george: dom0 jumps into Xen, long running hypercall finishes, if it wasn't done it stores how far it got, and pushes the IP back one instruction so when VM runs again it will reexecute same hypercall mark: you called first one with a context, that continuation might've been stashed andy: you'll need large trace buffers george: write to disk might be limit, might be minutes worth of events optimization edwin: could filter on dump if needed, also hypercall could store a bit saying we started so we don't emit repeated traces andy: we guarantee forward progress in hypercalls, but limited room to store continuation info edwin: current context? andy: hypervisor can't tell which process is current, continuation might be on different cpu, stashing domain-related hypercalls we can stash state in struct domain/cpu, except for first hypercall george: multicall like wrapper for tracing, Xen is not preemptable, store start/end other ops inbetween may get tagged incorrectly when interrupted andy: xentrace doesn't distinguish sync and async events, to ignore interrupts andy: how big, what format? marcusg/edwin: unique ids, 128-bit for context ,span 64-bit, but has parent/child and if child is unique it has a link to full parent edwin: spanid passed in, hypervisor george: postprocess and generate opentracing format there, hypervisor is not preemptible, can infer parents andy/george: how to gen id? random? host unique: vcpu index + per-cpu counter. rdtsc? not reliable pcpu bits + rdtsc bits --> unique pcpu + incrementing counter andy: tracing interrupts? not associated with anything george: sometimes you can tie it back to a high-level task, e.g. global tlb flush andy: properties of timestamps? marcusg: begin/end event, translatable back george: xentrace has (optional) time information, correlates across pcpus, uses scheduling as a lamport clock. TSC drift not really a problem anymore andy: invariant TSC, assume andy: host mode tsc: no scaling/offset so timestamps in guest can be compared to Dom0 timestamps: convert to ns andy: clocks change frequency with temp, etc. Dom0 clocksource: tsc as clocksource doesn't quite work, but kernel could have NTP adjustment info andy: what information would we want? marcusg: who/what features are being used? george: add tracing as needed marcusg: tracing what happens with nested virt andy: that would trap too often edwin: pick one op as PoC, but more than one optimization george: get ready for tracing "in anger", do a proof of concept on a particular op to figure out the mechanism domain create, get xenalyze ready to translate

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.