[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Transcendent Memory ("tmem") -capable kernel now publicly released

> From: Phillip Susi [mailto:psusi@xxxxxxxxxx]
> Subject: Re: [Xen-users] Transcendent Memory ("tmem") -capable kernel now 
> publicly released

Hi Phillip!  Thanks for the great questions!

> On 04/16/2012 04:11 PM, Dan Magenheimer wrote:
> > That weird ASCII text is intended to allow the hypervisor to
> > communicate with userland in a fully-backwards-and-forwards-compatible
> > way.  It must be parsed by a userland program, i.e.:
> >
> > # xm tmem-list --long --all | /usr/sbin/xen-tmem-list-parse
> >
> > I usually surround the above with 'watch -d "<above commands>"'
> > in a big window to watch things change.  Dunno if ubuntu
> > has the watch command though.
> >
> > (If the parsing program isn't at that location, it may not be
> > built or maybe is installed elsewhere on a Debian-ish system.
> > See the source for it in xen source in xen/tools/misc.)
> I'm curious as to why the parser is a separate utility instead of just being 
> the normal output of the
> management tool, with an optional --machine-parsable switch to fall back to 
> the current behavior.

First, as you've seen, most of the tmem-list info is really more like
debugfs stuff from the kernel.  Very valuable to developers and
maybe interesting to performance analysts or students studying 
tmem but not useful to normal users.  The info was intended
to grow (and/or shrink) as more (less) data was determined
to be useful to its intended audience.

Second, tmem predates "xl" and all the great tools-layering work
done by Ian and Ian and Stefano and others.  Maybe some of the
key data could easily be added to and reported via xl nowadays
but, given the intended audience, I'm not sure it's worth reinventing
the wheel... though patches welcome!

> > The parsed output has a huge amount of detail so is still mostly
> > undecipherable to mortals.  The key data can also be watched in
> > xentop (aka "xm top")... press "t" for tmem data (which will
> > show up only if any of the key values are non-zero).  Also
> > in xentop, you can watch selfballooning working.  (Note
> > "xm list" has had a bug forever so doesn't show "current memory"
> > just the memory the domain was launched with.)
> Now THAT sounds like what I have been looking for.  I'll have to check it out.

Great! ;-)

> > For ephemeral pools (i.e. via cleancache), "gets" are destructive,
> > so the page of data is moved (removed from tmem) on a successful get.
> > So no duplication.  For persistent pools (i.e. via frontswap),
> > "gets" are non-destructive, so the page of data is copied from tmem
> > on a successful get, which mimics a swap device, thus the name
> > "persistent"; a "flush" is required to destroy persistent pages of data.
> Hrm... I thought one of the advantages of tmem was that when multiple domains 
> happen to be caching the
> same data, the page only needs to be kept in ram once.  If get moves the page 
> out of tmem and into the
> domain, that would seem to preclude that.  Also I understand that tmem can 
> compress the data, in which
> case it doesn't seem possible to do a move as the data must first be 
> decompressed.

I suspect we're quibbling about the precise meaning of "copy" and "move".
The guest only hands a pageframe.  The hypervisor is responsible
for the data movement and/or removing the hypervisor's copy of
the data.  There's no fancy virtual-address-translation tricks.

By default, tmem only "shares" data across domains if all of these
domains are mounting the same clustered filesystem which is only
implemented today for ocfs2.  With tmem_dedup (as an additional Xen
boot option), pages "put" into tmem are deduplicated so there is
only one copy kept, reference counted, for identical pages
whether from the same domain or different domains.  With
tmem_compress as an additional Xen boot option, the hypervisor
de/compresses the data as part of the get/put operations.
And, yes, tmem_dedup and tmem_compress can be used together
to minimize total memory use.

Tmem_compress and tmem_dedup cost a lot more cycles but are usually
well worth it.  They aren't turned on by default only because they've
gotten much less testing.

> > P.S. Although Xen tmem will work without frontswap, I don't recommend it
> > (esp selfballooning) without frontswap.  Selfballooning can be aggressive
> > and may sometimes cause swapping, which is absorbed by frontswap
> > instead of going to a swap disk.  That's why the Oracle UEK2 kernel
> > includes frontswap even though it isn't upstream yet, see
> > http://lwn.net/Articles/465317/ (we'll be trying upstream again soon).
> I'll have to try and find the patch series somewhere and merge it into my 
> local Ubuntu git tree.

Try http://oss.oracle.com/git/djm/tmem.git/?p=djm/tmem.git;a=summary 
with the frontswap-v14 or frontswap-v11 branch, depending on
your kernel version.  All the differences between v11 and v14
are either doc-only or irrelevant to Xen.  (v13 is also in


Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.