[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Vanilla Xen total CPU %


  • To: Nick Calvert <nick.calvert@simplyhosting.cloud>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxx>
  • Date: Mon, 8 Jun 2020 13:57:22 +0000
  • Accept-language: en-GB, en-US
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: George Dunlap <dunlapg@xxxxxxxxx>, Nick Calvert <nick.calvert@xxxxxxxxx>, xen-users <xen-users@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 08 Jun 2020 13:58:54 +0000
  • Ironport-sdr: PLxpWU5WXxDhZ/UOMYPQ9TENm38ksXmAikCRmf8JvFzD85KDuZ9SkZf/Sidq46a6KUO6xWMpUA R7VGpK7wuVuNrqNBUDvLQa5e8pE3Xeimj9evWdCxVVWGUAFlKGHaDuwFU+yDdj8OSI0loR9OlT IkXSHfUDcZ7U3GdfAKbt85pqm5HFlQnEyT1AZZppeyBeYdTRlX4feTj7m91B319G32Q/OKw+bB TfTCXFbAsWdCJcPug1foKPMLnfimbeRPbHpheQdeOApnqP6LXrKVtvjCekarNDHBcAPHzuUcYw dRM=
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>
  • Thread-index: AQHWPYPfnKEJxSFR3EaSzb0yne1H1ajOe6uAgAAhPAA=
  • Thread-topic: Vanilla Xen total CPU %


> On Jun 8, 2020, at 12:58 PM, Nick Calvert <nick.calvert@simplyhosting.cloud> 
> wrote:
> 
> Hi George, 
>  
> Thank you very much for taking the time to respond to me. 
>  
> What I have been trying to do (and I think it’s the same for the other 
> abandoned projects I came across) is come up with an equivalent to something 
> like the Hyper-V % Total Runtime counter, which gives a (probably not very 
> precise) but useful account of the total CPU ‘load’ of a hypervisor. 
>  
> This is the sort of metric which can be useful for spotting overall trends, 
> or when the sum of all virtual machine CPU usage passes an alerting trigger. 
> I appreciate that at this point it would probably be necessary to look at 
> other metrics to determine what was actually happening. 
>  
> What I was trying to do was stream some of the xentop counters into a time 
> series database  (influxdb) so I could graph this. Other people have 
> attempted the same, as an example there are projects doing this with 
> graphite, some people were using old xm python bindings to basically do 
> exactly as you describe in your mail and return their own usage % for 
> graphing. 

There are a couple of simple things we could do to make this sort of thing 
easier.  We could:

1. Add a ‘hypervisor utilization’ field which does this addition for you

2. Add an option to xentop have a ‘json’ output format

3. Modify xentop to allow a “format” string, such that you could request it 
only output the “hypervisor utilization”.

>  Being able to do this with Go in 4.13 is very interesting and I did not know 
> such bindings existed.  I have some experience with the language so will look 
> at this now.  I am however straddled with some older versions of Xen and had 
> got as far as building a parser and a.) pushing the cpu seconds value for 
> each domU directly into a database and playing with them as if they were a 
> network interface counters and b.) taking samples in a time interval and 
> performing a calculation just as you describe and inserting these directly 
> into a database with a timestamp. 

We haven’t made a big deal of golang bindings yet, because it’s still labelled 
as “experimental”: there’s a lot of missing functionality, and we don’t yet 
promise not to break backwards compatibility. Xen 4.13 had only some very basic 
functions and structures defined, but ListDomain was one of them.  For our 
upcoming 4.14 release (should be out in a month or so), the functionality is 
greatly expanded, but still not yet complete.

I think it’s *very* unlikely that the signature of that function is going to 
change significantly, so I think you’re reasonably safe using it.

One downside of the approach of writing your own binary that libxl is tied to 
the particular version of the hypervisor you’re using; so every time you update 
Xen you have to recompile a new binary.  (This is currently true both for C and 
golang.)

>  One thing I was unsure on - and I think this is my ignorance on how such 
> things are calculated - is how the total CPU capacity of the hypervisor 
> impacted this. For instance, if I have a 20 real core, hyperthreaded 
> hypervisor and a 100 second interval, I guess as an oversimplification there 
> are 4000 CPU seconds are available for execution in that interval? That is 
> when I started to get confused about the how to determine a total % from the 
> info in xentop. 

I’d have to look into the code to be sure, but yes, that’s normally how things 
work:  1000 added to the “cpu time” corresponds to 1000ns of execution on a 
single cpu.  So if you have 8 cores all executing in parallel for 1000ns, that 
would look like 8000ns.  xentop would then normally do the calculation I 
described to you — presenting 8000ns / 1000ns * 100% => 800%

So to answer your *original original* question, adding up all the percentages 
of the domains from xentop for a time period *should* give you the total 
utilization for that time period.

I can double-check and get back to you.

 -George

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.