[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] linux stubdom



Hello,

Am 30.01.2013 um 10:36 Uhr schrieb Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> On Tue, 2013-01-29 at 19:32 +0000, Markus Hochholdinger wrote:
> > Am 29.01.2013 um 17:36 Uhr schrieb Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> > > On Tue, 2013-01-29 at 15:46 +0000, Markus Hochholdinger wrote:
[..]
> The change of $domid doesn't matter since a migration involves
> reconnecting the devices anyway, which means thy will reconnect to the
> "new" driver domain.

OK, I understand that not a numeric id has to be equal but the name of the 
driver domU.


> The normal way would be to have a driver domain per host but there's no
> reason you couldn't make it such that the driver domain was migrated too
> (you'd have to do some hacking to make this work).

If my driver domU is on the same hardware host as the domU I don't have to 
care about split brain situations for the storage. So my idea is to have one 
driver domU for each normal domU.
And the live migration is possible difficult. As I understand I have to create 
somehow the driver domU on the destination to have the block device my normal 
domU is to be connected. I'll look into this if I find no easier solution.


> AIUI you currently have a RAID1 device in the guest, presumably
> constructed from 2 xvd* devices? What are those two xvda devices are
> backed by? I presume it must be some sort of network storage (NFS,
> iSCSI, NBD, DRDB) or else you just couldn't migrate.

Well, perhaps some device path say more than my bad english:

node1:/dev/xbd/mydomu.node1 -> /dev/vg0/mydomu (also exported over iscsi)
node1:/dev/xbd/mydomu.node2 -> /dev/sdx (imported over iscsi)

node2:/dev/xbd/mydomu.node1 -> /dev/sdy (imported over iscsi)
node2:/dev/xbd/mydomu.node2 -> /dev/vg0/mydomu (also exported over iscsi)

node3:/dev/xbd/mydomu.node1 -> /dev/sdy (imported over iscsi)
node3:/dev/xbd/mydomu.node2 -> /dev/sdz (imported over iscsi)

In /dev/xbd/* there are only symlinks to the according device so I have 
consistent path on all nodes.

On all hardware nodes (node1, node2 and node3) I can access the logical volume 
/dev/vg0/mydomu on node1 with the path /dev/xbd/mydomu.node1 which I use in my 
domU configurations. If I'm not on node1 the block device is transported over 
iscsi. Because on all nodes /dev/xbd/mydomu.node1 points to the same block 
device I'm able to live migrate the domUs independently where the physical 
location of the logical volume is.

So my xvda and xvdb inside the domU are baked by /dev/xbd/mydomu.node1 and 
/dev/xbd/mydomu.node2 and if one or both of these logical volumes are not 
local it uses iscsi (in dom0) for transport.


> Are you intending to instead run the RAID1 device in a "driver domain",
> constructed from 2 xvd* devices exported from dom0 and exporting that as
> a single xvd* device to the guest?

Yes, somehow. But the exported devices from dom0 don't have to be local 
logical volumes of the dom0 but can be remote iscsi block devices.

For me it is very important to be able to live migrate domUs but also have the 
storage at least redundant over two nodes.


> Or are you intending to surface the
> network storage directly into the driver domain, construct the RAID
> device from those and export that as an xvd* to the guest?

No.


> > Do you know how I can live migrate a domU which depends on a driver domU?
> > How can I migrate the driver domU?
> > For my understanding the block device has to be there on the destination
> > dom0 before live migration begins but is also used on the source dom0
> > from the migrating, but still running, domU.
> Not quite, when you migrate there is a pause period while the final copy
> over occurs and at this point you can safely remove the device from the
> source host and make it available on the target host. The toolstack will

Isn't the domU on the destination created with all its virtual devices before 
the migration starts? What if the blkback is not ready on the destination 
host? Am I missing something?


> ensure that the block device is only ever active on one end of the other
> and never on both -- otherwise you would get potential corruption.

Yeah, this is the problem! If I migrate the active raid1 logic within the domU 
(aka linux software raid1) I don't have to care. I'll try to accomplish the 
same with a "helper" domU very near to the normal domU and which is live 
migrated while the normal domU is migrated.


> While you could migrate the driver domain during the main domU's pause
> period it is much more normal to simply have a driver domain on each
> host and dynamically configure the storage as you migrate.

If I dynamically create the software raid1 I have to add a lot of checks which 
I don't need now.
I've already thought about a software raid1 in the dom0 and the resulting md 
device as xvda for a domU. But I have to assemble the md device on the 
destination host before I can deactivate the md device on the source host. The 
race condition is, if I deactivate the md device on the source host while data 
is only written to one of the two devices. On the destination host my raid1 
seems clean but my two devices differ. The other race condition is, if my 
raid1 is inconsistent while assembling on the destination host.


> > Can I combine a driver domU to a normal domU like I can combine a stubdom
> > with a normal domU?
> If you want, it would be more typical to have a single driver domain
> providing block services to all domains (or one per underlying physical
> block device).

I want :-) A single driver domain would need more logic (for me) while doing 
live migrations.


> > I thought a stubdom live migrates with its domU so you don't have to
> > worry that the driver domU live migrates while the according normal domU
> > migrates.
> A stubdom is a bit of an overloaded term. If you mean an ioemu stub
> domain (i.e. the qemu associated with an HVM guest) then a new one of
> those is started on the target host each time you migrate.

OK, this isn't what I want. For me a (re)start on the destination host is no 
better than if I would do the software raid1 in dom0.


> If you mean a xenstored stubdom then those are per host and are not
> migrated.

Hm, if they are not migrated they don't behave like I expect.


> And if you mean a driver domain then as I say those are usually per host
> and the domain will be connected to the appropriate local driver domain
> on the target host.

OK, it seems I want a driver domain which I would migrate while migrating the 
according normal domU.


[..]
> > OK, I see. But still, would it be possible to run linux in a stub-domain?
> > I've read e.g.
> > http://blog.xen.org/index.php/2012/12/12/linux-stub-domain/ which
> > describes this, but I'm unsure if this will be (or is already) supported
> > by current xen?
> This work is about a Linux ioemu stub domain. That is a stubdomain with
> the sole purpose of running the qemu emulation process for an HVM
> domain. I think the intention is for this to land in Xen 4.3 but it does
> not have anything to do with your usecase AFAICT.

OK, I see that. If the linux ioemu stub domu would be (re)started on the 
destination host on a live migration it doesn't solve my problem.


> Everything you want to do is already possible with what is in Xen and
> Linux today, in that the mechanisms all exist. However what you are
> doing is not something which others have done and so there will
> necessarily need to be a certain amount of putting the pieces together
> on your part.

Yeah, this gives hope to me :-)


[..]
> > Yes, I would provide two block devices (logical volumes) into the driver
> > domU,
> How are you doing this? Where do those logical device come from and how
> are they getting into the driver domU?

See the explanation above for details, the logical volumes come from the local 
host and/or from remote hosts over isccsi with a consistent path on all hosts.


> > create there a software raid1 device and make the md device available
> > with blkback. I would do this for each domU so I can live migrate domUs
> > independent. The driver domU only needs a kernel and a initrd with a
> > rootfs filled with enough to build the md device and export it with
> > blkback.
> > But how can I address this exported block device? As far as I've seen I
> > need the $domid of the driver domain in my config file of the domU, or
> > am I missing something?
> $domid can also be a domain name, and you can also change this over
> migration by providing an updated configuration file (at least with xl).

If $domid can be a name it really would be possible for me. Great!


> > So this would be a dependency that the driver domain is started before
> > the stubdom with qemu.
> Yes.
> > In some subdom-startup script I saw the parameter "target" while creating
> > the stubdom. Is this a possibility to combine two domUs?
> I think "target" in this context refers to the HVM guest for which the
> ioemu-stubdom is providing services.

Yes, I've also thought this way. So my idea was to create a driver domain with 
target to the according domU so I only "see" one domU. It could ease the 
management I've thought.


> > As far as I understand with a driver domU I need the $domid of the driver
> > domU in my config, so this is the connection between the two domUs.
> > But with stub-domains I haven't understand how data flows between stubdom
> > and domU and because I've seen a lot of nice little pictures describing
> > that I/O flows between domU over stubdom to dom0 and backwards I thought
> > stub-domains would be the way to go.
> Only if you are using emulated I/O. I assumed you were using PV I/O, is
> that not the case?

OK, my bad. I've two use cases:
1. Provide a redundant block device for a PV domU but I'm not able to manage
   the software raid1 inside the domU.
2. Provide a redundant block device for a HVM domU for operating systems which
   have no (good) software raid1 implementation.

Many thanks so far, I'll try the driver domain approach and test if it solves 
my problem and if it looses not too much performance.


-- 
greetings

eMHa

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.