[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Get some useful metrics out of tmem-list

After having upgraded both hypervisor and domU kernels we're now actually using 
tmem and we can see all those little numbers going up and down in the output of 
xl tmem-list -la | xen-tmem-list-parse.

Now we'd like to extract some useful data from those numbers, to try to answer 
questions like:
- how much tmem are we using?
- would we benefit from having more?
- how much are we gaining from compression? and from dedup?
- and so on
I'm mostly interested in global data of the host and not per-domain specific 
usage of tmem and I want to write a script to extract that data so you can 
follow it with your preferred monitoring system (as an example, I'm gonna graph 
that data with Cacti).

The first line of output is:
        total tmem ops=9057925 (errors=539921) -- tmem pages avail=90958
I'm guessing total ops is the sum of all get/put/etc... operations initiated by 
tmem consumers (xen guests actually) so it should give me an idea of the 
overall activity on tmem. Does it count all operations (even ones that result 
in errors) or only successful ones?

I'm also guessing errors is (mostly?) failed gets because the page got evicted 
and failed puts because there isn't space available, so I could infer that 
continuously incrementing errors should tell me I would benefit from having 
more tmem available.

pages avail is the "free ram" in the hypervisor that could be used for tmem? 
(ie, if I don't start new guests or enlarge running ones) If so, I would expect 
that number to reduce after some runtime and stay low unless I kill some 
running domain, because using cleancache as the only tmem consumer should 
rarely free up pages.

The other two lines of output are:
        datastructs: objs=5775 (max=15260) pgps=85275 (max=186716) nodes=5901 
(max=12744) pages=35501 (max=100026) pcds=84621 (max=184873) deduped: avg=1.38% 
(curr=0.77%) compression savings=30.09% 
        misc: failed_copies=0 alloc_failed=15416 alloc_page_failed=0 low_mem=0 
evicted=0/0 relinq=0/0, max_evicts_per_relinq=0, flush_pools=0, 
eph_count=85275, eph_max=186716

I guess those numbers will tell me some things like tmem usage  (used / free / 
available?) and how much I'm benefiting from compression and dedup
 (the % values in the datastructs line?).

Can someone confirm / correct my assumptions and fill the voids?
Are those 'pages' fixed size ones? If that's the case, what's the page size on 
x86_64 ?
Is the "pages" number the current tmem usage of actual ram? Can I assume that 
means it contains pages * (1+deduped.curr) * (1+compression_savings) data?

Luca Lesinigo
Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.