Xen project Mailing List

Re: Vanilla Xen total CPU %

To: Nick Calvert <nick.calvert@simplyhosting.cloud>

From: George Dunlap <George.Dunlap@xxxxxxxxxx>

Date: Mon, 8 Jun 2020 13:57:22 +0000

Accept-language: en-GB, en-US

Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: George Dunlap <dunlapg@xxxxxxxxx>, Nick Calvert <nick.calvert@xxxxxxxxx>, xen-users <xen-users@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 08 Jun 2020 13:58:54 +0000

Ironport-sdr: PLxpWU5WXxDhZ/UOMYPQ9TENm38ksXmAikCRmf8JvFzD85KDuZ9SkZf/Sidq46a6KUO6xWMpUA R7VGpK7wuVuNrqNBUDvLQa5e8pE3Xeimj9evWdCxVVWGUAFlKGHaDuwFU+yDdj8OSI0loR9OlT IkXSHfUDcZ7U3GdfAKbt85pqm5HFlQnEyT1AZZppeyBeYdTRlX4feTj7m91B319G32Q/OKw+bB TfTCXFbAsWdCJcPug1foKPMLnfimbeRPbHpheQdeOApnqP6LXrKVtvjCekarNDHBcAPHzuUcYw dRM=

List-id: Xen user discussion <xen-users.lists.xenproject.org>

Thread-index: AQHWPYPfnKEJxSFR3EaSzb0yne1H1ajOe6uAgAAhPAA=

Thread-topic: Vanilla Xen total CPU %

> On Jun 8, 2020, at 12:58 PM, Nick Calvert <nick.calvert@simplyhosting.cloud> > wrote: > > Hi George, > > Thank you very much for taking the time to respond to me. > > What I have been trying to do (and I think it’s the same for the other > abandoned projects I came across) is come up with an equivalent to something > like the Hyper-V % Total Runtime counter, which gives a (probably not very > precise) but useful account of the total CPU ‘load’ of a hypervisor. > > This is the sort of metric which can be useful for spotting overall trends, > or when the sum of all virtual machine CPU usage passes an alerting trigger. > I appreciate that at this point it would probably be necessary to look at > other metrics to determine what was actually happening. > > What I was trying to do was stream some of the xentop counters into a time > series database (influxdb) so I could graph this. Other people have > attempted the same, as an example there are projects doing this with > graphite, some people were using old xm python bindings to basically do > exactly as you describe in your mail and return their own usage % for > graphing. There are a couple of simple things we could do to make this sort of thing easier. We could: 1. Add a ‘hypervisor utilization’ field which does this addition for you 2. Add an option to xentop have a ‘json’ output format 3. Modify xentop to allow a “format” string, such that you could request it only output the “hypervisor utilization”. > Being able to do this with Go in 4.13 is very interesting and I did not know > such bindings existed. I have some experience with the language so will look > at this now. I am however straddled with some older versions of Xen and had > got as far as building a parser and a.) pushing the cpu seconds value for > each domU directly into a database and playing with them as if they were a > network interface counters and b.) taking samples in a time interval and > performing a calculation just as you describe and inserting these directly > into a database with a timestamp. We haven’t made a big deal of golang bindings yet, because it’s still labelled as “experimental”: there’s a lot of missing functionality, and we don’t yet promise not to break backwards compatibility. Xen 4.13 had only some very basic functions and structures defined, but ListDomain was one of them. For our upcoming 4.14 release (should be out in a month or so), the functionality is greatly expanded, but still not yet complete. I think it’s *very* unlikely that the signature of that function is going to change significantly, so I think you’re reasonably safe using it. One downside of the approach of writing your own binary that libxl is tied to the particular version of the hypervisor you’re using; so every time you update Xen you have to recompile a new binary. (This is currently true both for C and golang.) > One thing I was unsure on - and I think this is my ignorance on how such > things are calculated - is how the total CPU capacity of the hypervisor > impacted this. For instance, if I have a 20 real core, hyperthreaded > hypervisor and a 100 second interval, I guess as an oversimplification there > are 4000 CPU seconds are available for execution in that interval? That is > when I started to get confused about the how to determine a total % from the > info in xentop. I’d have to look into the code to be sure, but yes, that’s normally how things work: 1000 added to the “cpu time” corresponds to 1000ns of execution on a single cpu. So if you have 8 cores all executing in parallel for 1000ns, that would look like 8000ns. xentop would then normally do the calculation I described to you — presenting 8000ns / 1000ns * 100% => 800% So to answer your *original original* question, adding up all the percentages of the domains from xentop for a time period *should* give you the total utilization for that time period. I can double-check and get back to you. -George

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.