[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] Comments on Xen API document

On Thu, Aug 17, 2006 at 12:13:24PM +0100, Ewan Mellor wrote:

> > ErrorDescription isn't specified well enough to program against.
> What do you mean by this?  We still haven't specified the actual error codes,
> their meaning, and the parameters that the code implies, but other than that I
> thought it was well enough specified.  Could you be more specific?

I think I was actually just repeating myself, sorry about that.

> > What happens to the session if a connection is terminated or idles out?
> This isn't in the document, but I think the behaviour we'll go for is that
> sessions can time out, and any call could potentially return a "NoSuchSession"
> error, and so the client must be prepared to authenticate again.

Sounds reasonable.

> Note that session objects are independent of the underlying transport, so
> there's no reason why a TCP timeout should necessarily cause a session
> timeout.

Though there's an obvious "accidental DoS" opportunity if session objects
persist forever.

> > The specification implies that a failed task must have its Progress set to
> > "100". This seems odd.
> Odd maybe, but not harmful, I think.  The intention is that you continue to
> poll until you see 100, and then parse the result for success or failure codes
> etc -- it's not intended that 100 indicates success.

I suppose that's OK, but the spec should call this out explicitly I think.

> > More widely, has there been consideration of indicating
> > /phases/ of operation? Often this is more useful than a generic slider 
> > value.
> No, no-one's considered that, as far as I know.  What would you expect here?
> Would we end up with a whole new set of codes to indicate task progress, on
> top of the error codes?

It might be useful to label the stages of a live migration for example. "Hmm,
it hung during the suspend call in the domU" rather than "it stopped at x

> > I don't understand the expanded XML for the do_login_with_password() - the
> > paramters are not name/value pairs. If this is a general design, how are
> > optional parameters to be handled?
> Optional parameters aren't handled at all, deliberately, as they add
> complexity that I don't think we need, and would make life difficult for
> bindings to languages that don't have them (like C).
> Any call that takes a struct could in the future have optional entries, since
> those are defined using name-value pairs, but again, we've not allowed that in
> the specification yet.

This might become a problem in the future for "unstable" data items.

> > Probably an XML RPC thing, but there's no discussions of transactions in the
> > document. Can I change a batch of things and assure it's done atomically?
> No -- we want the client to make rollback / error handling decisions -- Xend
> simply isn't the right place for that kind of policy.  Atomicity is impossible
> to guarantee for a lot of the parameters in any case, because Xen doesn't
> offer a transactional interface either.

Hmm. This leaves clients with an unfortunate problem. Consider a client that
wants to upgrade a domU's kernel with an external method. It can't change both
kernel and initrd without a window of opportunity for another client to attempt
booting the domU with an invalid configuration. Perhaps some thought should be
given to making xend itself transactional somehow? It could be as simple as
letting clients "lock" domUs...

> > Do zombie domains need to be explicitly called out in the lifecycle?
> Hmmm, maybe.  I'll have a think.  Zombies are more related to the _domain_
> rather than the _VM_ -- you could restart the VM even with a zombie lying
> around, there are just some additional resources being consumed -- so I don't
> think that they really belong in that lifecycle.

Fair point.

> > enum vm_power_state - "power" is a very odd notion for a VM. Why not just
> > "vm_state" ?
> I'm not wedded to the name -- I just wanted to avoid having a completely
> generic name like "state".  Perhaps another term would be better?

I understand the concern, but in this case I think 'state' is pretty
descriptive when attached to a domU. Or, perhaps, "runstate" ?

> > enum on_crash_behaviour - it's likely useful to have a host-wide setting for
> > whether domains should core dump. How would that interact?
> We could core dump if either setting said so, I guess, but there's no per-host
> setting for this in the API at the moment.  Isn't this something that we could
> punt to the higher-level tools?

Yes, probably.

> > VCPUs/params needs a detailed description of what it can/does contain
> It's meant to be a free-form string, interpreted by whatever scheduler is
> running.

So it's explicit that an XML RPC client can't configure the scheduler
parameters, right?

> > kernel/kernel et al - depending on bootloader, is this the copied kernel 
> > path
> > on dom0 filesystem, or the path inside the domU filesystem? e.g. pygrub,
> > domuloader
> Yes, that's right.

I think the changing meaning of the parameters depending upon the boot method
should be described explicitly in the document.

> > tools_version - needs defining explicitly?
> We'll certainly need to define some permanent entries, but we wanted this to
> be open for expansion later too.  Obviously we'll guarantee that there will be
> a "Xend" entry, but we wanted for other version numbers to be able to slot in
> here too, such as the version of PV drivers inside guests.


> > There's no mapping from VM vcpus to the host_cpu's it's pinned on.
> Pinning is something that would get encoded into VCPUs/params, when the
> scheduler supports such functionality.

More generally, though, there's no way to examine load balancing across the
physical CPUs that I can see? This seems like pretty critical information.

> > suspend/resume - I'm a little confused about where the domain save file
> > actually goes, and how 'xm' will specify it
> It doesn't "go" anywhere defined -- that's an internal issue for Xend to deal
> with.  We're moving away from the model where you can save and restore to
> files and move them around willy-nilly -- people have a tendency to corrupt
> their filesystems by starting the same save file twice, for example.
> If you actually want the save file, you need some kind of "export" call, which
> we haven't yet defined.


> > 'disable' and 'enable' seem perhaps a bit too generic name-wise.
> What would you prefer?

Good question. I'll think...

> > I'm not clear on what an SR actually is and its relationship to its 
> > contained
> > VDIs. Could the document clarify this somewhat?
> Sure, we could clarify this.  An SR is a storage segment, like a partition, or
> an LVM volume, or a LUN on a SAN.  I'll get our storage guys to specify this a
> little better than I can manage!  With blocktap or loopback files, you can
> have multiple VDIs within an SR, one backing file per VDI, hence the
> containment relationship.

I need to think about this in respect to ZFS somewhat...

> Thanks for your feedback, it's been very useful,

Thanks for the replies!


xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.