[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: txenmon: Cluster monitoring/management
On Tue, Feb 10, 2004 at 08:06:25AM +0000, Ian Pratt wrote: > > On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote: > > > Of course, this will all be much neater in rev 3 of the domain > > > control tools that will use a db backend to maintain state about > > > currently running domains across a cluster... > > > > Ack! We might be doing duplicate work. How far have you gotten with > > this? > > We haven't even started, but have been thinking about the design, > and what the schema for the database show be etc. When you say "database", do you mean "an independent sqlite running in each dom0", or do you mean "a central SQL server running somewhere on a dedicated machine"? (See further down for why I ask.) As far as schema goes, the things I've needed to track so far are these "control" items, referenced in the guest.ctl() calls in txenmon: domid gw host ip kernel mem run swap vbds ...and I'm considering adding a 'reboot' boolean. I also track several runtime state items as attributes of the Guest class -- the whole object is saved as a pickle, so see __init__ for a list of them. The NFS export directory tree looks something like this: /export/xen/fs/stevegt /export/xen/fs/stevegt/tcx /export/xen/fs/stevegt/tcx/root /export/xen/fs/stevegt/tcx/ctl /export/xen/fs/stevegt/tcx/log /export/xen/fs/stevegt/xentest1 /export/xen/fs/stevegt/xentest1/root /export/xen/fs/stevegt/xentest1/log /export/xen/fs/stevegt/xentest1/ctl /export/xen/fs/stevegt/xentest2 /export/xen/fs/stevegt/xentest2/root /export/xen/fs/stevegt/xentest2/log /export/xen/fs/stevegt/xentest2/ctl /export/xen/fs/stevegt/crashme1 /export/xen/fs/stevegt/crashme1/root /export/xen/fs/stevegt/crashme1/ctl /export/xen/fs/stevegt/crashme1/log ...where 'stevegt' is a user who owns one or more virtual domains, and 'xentest1' is the hostname of a virtual domain. Those control items I mentioned above go in individual files (qmail style) under ./ctl, and the python pickle for each virtual domain is saved as ./log/pickle. The root partition for each domain is under ./root. Here's what the contents of ./ctl look like for a given guest: nfs1:/export/xen# ls -l /export/xen/fs/stevegt/tcx/ctl total 32 -rw-r--r-- 1 root root 3 Feb 8 20:57 domid -rw-r--r-- 1 root root 12 Feb 5 22:51 gw -rw-r--r-- 1 root root 6 Feb 9 21:56 host -rw-r--r-- 1 root root 13 Feb 8 20:57 ip -rw-r--r-- 1 root root 30 Feb 5 22:52 kernel -rw-r--r-- 1 root root 4 Feb 9 17:47 mem -rw-r--r-- 1 root root 2 Feb 9 21:56 run -rw-r--r-- 1 root root 14 Feb 5 22:53 swap -rw-r--r-- 1 root root 0 Feb 5 22:52 vbds Because these are individual files, this makes it easy to say, for instance, 'echo 0 > run' from a shell prompt to cause a domain to shut down, or 'echo node43 > host' to cause it to move to a different node. I considered using the sqlite db for these things; I didn't do that (1) because this was faster to implement and easier to access from the command line, and (2) I didn't want to cause future schema conflicts with whatever you were going to do. * * * Having said all this, I'm less worried about schema and more worried about single points of failure. Right now txenmon runs in domain 0 on each node, and the data store is distributed as above. This gives me a dependence on the central NFS server staying up, but an NFS server is a relatively simple thing, it can be HA'd, backed up easily, and will tend to have uptimes in the hundreds of days anyway as long as you leave it alone. If these data items were to move into a "real" database server instead, say a central mysql or postgresql server, than I'd worry more; database servers aren't as easy to keep available for hundreds of days without interruption. (See http://Infrastructures.Org for more of my perspective on this.) I'm moving in the direction of keeping some sort of distributed data store, like those flat files and python pickles, (or use the sqlite on each dom0?) which can be cached on local disk in each dom0, and then use something like UDP broadcast (simple) or XMPP/jabber (less simple) as a peer-to-peer communications mechanism, to keep the caches synced. My goal here is to be able to walk into a Xen data center and destroy any random machine without impacting any user for more than a few minutes. (See http://www.infrastructures.org/bootstrap/recovery.shtml). To this end, I'm curious what people's thoughts are on backups and real-time replication of virtual disks -- I'm only using them for swap right now, because of these issues. * * * > Cool! It's always a nice surprise to find out what work is > going on by people on the list. As I said last night, you have me full time right now. ;-) My wife and I are launching a commercial service based on Xen (we were evaluating UML). I have until the end of March. If enough revenue is flowing by then, then you get to keep me. If not, then "the boss" will tell me to put myself back on the consulting market. Nothing like a little pressure. ;-) > You might want to try repulling 1.2 and trying the newer versions > of the tools which are a bit more user friendly. My most recent pull was a week ago; this got me xc_dom_control and xc_vd_tool. I'll likely do another pull this week. We already have one production customer (woo hoo!), so I am trying to limit upgrades/reboots for them. > Great, we'd love to see stuff like this in the tree. Would it help if I exposed a bk repository you could pull from, or how do you want to do this? Steve -- Stephen G. Traugott (KG6HDQ) UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@xxxxxxxxxxxxx http://www.stevegt.com -- http://Infrastructures.Org ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |