[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

xen summit 2023 design session: distributed tracing



design session: distributed tracing

Marcusg: proof of concept
* where to start, simplest parts
* how to attempt to collect the context (unique/random id),begin/end events
* e.g. domain create

George: pass in id to hypervisor
Andy: maybe userspace first, libraries/xenguest
 more difficult in calls like add_to_physmap
 continuable hypercall indistinguishable from userspace calling it N times
 e.g. new event becomes pending

george: dom0 jumps into Xen, long running hypercall finishes, if it wasn't done 
it stores how far it got,
and pushes the IP back one instruction so when VM runs again it will reexecute 
same hypercall

mark: you called first one with a context, that continuation might've been 
stashed
andy: you'll need large trace buffers
george: write to disk might be limit, might be minutes worth of events
optimization
edwin: could filter on dump if needed, also hypercall could store a bit saying 
we started so we don't emit repeated traces
andy: we guarantee forward progress in hypercalls, but limited room to store 
continuation info

edwin: current context?
andy: hypervisor can't tell which process is current, continuation might be on 
different cpu,
  stashing domain-related hypercalls we can stash state in struct  domain/cpu, 
except for first hypercall
george: multicall like wrapper for tracing, Xen is not preemptable, store 
start/end
other ops inbetween may get tagged incorrectly when interrupted
andy: xentrace doesn't distinguish sync and async events, to ignore interrupts
andy: how big, what format?
marcusg/edwin: unique ids, 128-bit for context ,span 64-bit, but has 
parent/child and if child is unique it has a link to full parent

edwin: spanid passed in, hypervisor
george: postprocess and generate opentracing format there, hypervisor is not 
preemptible, can infer parents

andy/george: how to gen id? random? host unique: vcpu index + per-cpu counter. 
rdtsc? not reliable

pcpu bits + rdtsc bits --> unique
pcpu + incrementing counter
andy: tracing interrupts? not associated with anything
george: sometimes you can tie it back to a high-level task, e.g. global tlb 
flush
andy: properties of timestamps?
marcusg: begin/end event, translatable back
george: xentrace has (optional) time information, correlates across pcpus, uses 
scheduling as a lamport clock.
TSC drift not really a problem anymore
andy: invariant TSC, assume
andy: host mode tsc: no scaling/offset so timestamps in guest can be compared 
to Dom0
timestamps: convert to ns
andy: clocks change frequency with temp, etc.
Dom0 clocksource: tsc as clocksource doesn't quite work, but kernel could have 
NTP adjustment info

andy: what information would we want?
marcusg: who/what features are being used?
george: add tracing as needed
marcusg: tracing what happens with nested virt
andy: that would trap too often
edwin: pick one op as  PoC, but more than one optimization
george: get ready for tracing "in anger", do a proof of concept on a particular 
op to figure out the mechanism
domain create, get xenalyze ready to translate


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.