[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Remus crashes only with Windows Server 2003 - tap2 issue



its patch -p1

On Mon, Feb 27, 2012 at 12:38 PM, Antonio Colin <dftonywhite@xxxxxxxxxxx> wrote:
Hello Shriram,

Thanks so much for your patch, I have been trying to apply it but there is a problem when doing it,
here I send you the errors thrown.

Any advice on how to do it properly??

Thanks a lot!

Tony.
----

root@neutrino:~/xen-4.1.1# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  2649     1     r-----     88.2
root@neutrino:~/xen-4.1.1# patch -p0 < timeouts.patch
(Stripping trailing CRs from patch.)
patching file b/tools/blktap2/drivers/block-remus.c
Hunk #1 FAILED at 59.
1 out of 1 hunk FAILED -- saving rejects to file b/tools/blktap2/drivers/block-remus.c.rej
(Stripping trailing CRs from patch.)
patching file b/tools/libxc/xc_domain_restore.c
Hunk #1 FAILED at 47.
1 out of 1 hunk FAILED -- saving rejects to file b/tools/libxc/xc_domain_restore.c.rej
(Stripping trailing CRs from patch.)
patching file b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c
Hunk #1 FAILED at 504.
1 out of 1 hunk FAILED -- saving rejects to file b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c.rej
root@neutrino:~/xen-4.1.1#

__________________________________________________________________________________________________________
> From: rshriram@xxxxxxxxx
> Date: Thu, 16 Feb 2012 10:06:56 -0800
> Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 - tap2 issue
> To: dftonywhite@xxxxxxxxxxx; dimitrios.melissovas@xxxxxxx
> CC: xen-users@xxxxxxxxxxxxxxxxxxx

>
> On Sat, Feb 11, 2012 at 5:17 PM, Antonio Colin <dftonywhite@xxxxxxxxxxx> wrote:
> >
> > PS: If you need further information or want me to test something please let me know.
> >
> > Tony.
> >
> > ________________________________
> > From: rshriram@xxxxxxxxx
> >
> > Date: Fri, 10 Feb 2012 11:52:04 -0800
> > To: dftonywhite@xxxxxxxxxxx
> > CC: xen-users@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003
> >
> > On Thu, Feb 9, 2012 at 10:29 AM, Antonio Colin <dftonywhite@xxxxxxxxxxx> wrote:
> >
> > Hi again Shriram,
> >
> > Thank you for your reply and explanation. You are right I need a different port, may be 9001 in that case, but see...
> > That was the full test but in fact I tested everything with one disk "(Unit C:)" and the same thing happens... if you think
> > that doing it that way would save more useful information in the logs I can save them again :).
> >
> > The NFS mount is in /mnt/domus only to begin testing remus. I put one VM image there... start remus with --no-net and everything is fine.
> > The directory /home/remus is just to work with remus and disk replication and is not and NFS mount.
> >
> > It is so strange that it works only for Linux!! (both are HVM)
> >
> > And yes, if that directory was shared that might corrupt my disk and I also need DRBD to replicate the image... is that possible for img files?
> > and just one last question... after failover how can I get back the execution of the VM from the backup to the primary host once it is ready ?
> >
> >
> > Let me investigate the blktap2 issue first.
> > DRBD does not replicate img files. You would have to put them in a partition or lvm volume and
> > replicate that volume to the backup host. Whether you want to write the image directly to the volume or
> > create a File system in that volume and drop the image file there, is upto you.
> >
> > shriram
> >
> > Thank you so much!!!
> >
> > Tony.
> >
> >
> > ________________________________
> > From: rshriram@xxxxxxxxx
> > Date: Thu, 9 Feb 2012 00:35:15 -0800
> >
> > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003
> > To: dftonywhite@xxxxxxxxxxx
> > CC: xen-users@xxxxxxxxxxxxxxxxxxx
> >
> >
> > On Wed, Feb 8, 2012 at 1:56 AM, Antonio Colin <dftonywhite@xxxxxxxxxxx> wrote:
> >
> > Hello Shriram,
> >
> > Just comming back to Remus HA, three weeks ago I sent this thread and the situation hasn't changed. You are right,
> > remus works properly with --no-net option.
> >
> > There is actually this tapdisk related error in the syslog file in the primary host:
> > Jan 17 17:28:58 xen-backup tapdisk2[5795]: remus: could not bind server socket 11 to 192.168.2.4:9000: 98 Address already in use
> >
> >
> > Thanks for the logs.
> >  The first thing that pops out is:
> > ['tap2', ['uname', 'tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange.img'], ['dev', 'ioemu:hda'], ['mode', 'w']],
> > ['tap2', ['uname', 'tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange-d.img'], ['dev', 'ioemu:hdb'], ['mode', 'w']],
> >
> > You have two tapdisk devices, but on the same port ? Each disk needs a different port, as a tcp connection is
> > established between primary and backup for each replicated disk.
> >
> >
> >
> > Also when I boot up the VM (Windows Server 2003) from NFS
> >
> >
> > from NFS ? just to make sure that we are on same page, is the above directory /home/remus an NFS mount ?
> > i.e. is that win2k3-exchange.img "shared" between the primary and backup host ?
> >  If so, then remus disk replication will not work, as its based on a shared-nothing model.
> >  In fact, it could corrupt your disk badly. If disk consistency is not an issue, then you are better off
> >  running remus without disk replication (though there is no guarantee that the domain will failover properly).
> >
> >
> >
> > and without remus or disk replication, in both the primary and the backup
> > there is in fact a vif attached to it which is bind to the bridge in the two cases.
> > I have the sch_plug module installed correctly in both hosts and everything works perfect for Linux systems.
> >
> >
> > Oh great. So network buffering is out of the picture. If it works for linux, it should work for windows too.
> >
> >
> > But it just cannot come true
> > for Windows.
> >
> > I attach xend.log and syslog from primary and backup if you'd like to see further information in order to help me.
> >
> > Thank you a lot!!
> >
> > Tony.
> >
> > > From: rshriram@xxxxxxxxx
> > > Date: Fri, 13 Jan 2012 09:54:35 -0800
> > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > CC: dftonywhite@xxxxxxxxxxx
> > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003
> >
> > >
> > > On Fri, Jan 13, 2012 at 9:05 AM, <xen-users-request@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > I have setup Remus on Debian Squeeze and kernel 3.1.5. Remus and disk replication works perfect  for Ubuntu systems,
> > > > but when I start Remus for Windows Sever 2003 (running Microsoft Exchange Enterprise 2003) it crashes giving the
> > > > following error:
> > > >
> > >
> > > Is that Ubuntu VM a PV or HVM ?
> > > I presume that remus with --no-net works properly ?
> > >
> > > > root@neutrino:~/working-remus# xm create exchange-hvm.cfg
> > > > root@neutrino:~/working-remus# remus exchange-hvm 192.168.2.4
> > > > qemu logdirty mode: enable
> > > > xc: error: Error when writing to state file (4a) (errno 104) (104 = Connection reset by peer): Internal error
> > > > qemu logdirty mode: disable
> > > > PROF: resumed at 1326315866.106150
> > > > resuming QEMU
> > > > tc filter del dev vif3.0 parent ffff: proto ip pref 10 u32
> > > > RTNETLINK answers: Invalid argument
> > > > We have an error talking to the kernel
> > > > Exception xen.remus.util.PipeException: PipeException('tc failed: 2, No such file or directory',) in <bound method BufferedNIC.__del__ of <xen.remus.device.BufferedNIC object at 0x24b7510>> ignored
> > >
> > > This error tells me nothing. "Connection reset by peer" could result
> > > from a lot of issues.
> > > A. check the syslog in primary and backup, for errors related to tapdisk
> > > B. Check the xend.log file in backup
> > > C. If your system works with --no-net, then try to boot up the VM
> > > without remus, and make sure that
> > > there is a vif interface for the VM. And make sure that interface is
> > > on the bridge (if you have bridging enabled).
> > > Remus tries to install a network buffer (sch_plug) to the vif interface.
> > >
> > >
> > >
> > > > root@neutrino:~/working-remus#
> > > >
> > > > It seems that on the backup remus or Xen cannot assign a vif1.0 to the DomU since #ifconfig -a doesn't show a new vif there
> > > > when starting remus.
> > > >
> > > > Any help would be highly appreciated!
> > > >
> > > > Tony.
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> >
> >
> >
> >
> > _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
>
> Tony & Dimitrios,
> Both of you seem to have faced issues with blktap2 based
> disk replication, while running remus. If you are interested in
> gettting blktap2 based replication
> running, can you guys try the patch below and let me know if it
> resolves the issue ?
>
> The patch basically increases the timeouts on both the disk and
> memory checkpoint receivers
> (block-remus.c & xc_domain_restore.c respectively)
> I have tested Remus on a Windows 7 HVM with blktap2 based replication
> (tap2:remus:<host>:<port>|aio:... format)
> Things seemed to run fine.
>
> shriram
> ---
> diff -r 34dec1562a45 tools/blktap2/drivers/block-remus.c
> --- a/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:33 2011 -0700
> +++ b/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:43 2011 -0700
> @@ -59,7 +59,7 @@
> #include <sys/stat.h>
>
> /* timeout for reads and writes in ms */
> -#define HEARTBEAT_MS 1000
> +#define HEARTBEAT_MS 5000
> #define RAMDISK_HASHSIZE 128
>
> /* connect retry timeout (seconds) */
> diff -r 34dec1562a45 tools/libxc/xc_domain_restore.c
> --- a/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:33 2011 -0700
> +++ b/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:43 2011 -0700
> @@ -47,7 +47,7 @@
> struct domain_info_context dinfo;
> };
>
> -#define HEARTBEAT_MS 1000
> +#define HEARTBEAT_MS 5000
>
> #define SUPERPAGE_PFN_SHIFT 9
> #define SUPERPAGE_NR_PFNS (1UL << SUPERPAGE_PFN_SHIFT)
> diff -r 34dec1562a45 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c
> --- a/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18
> 20:52:33 2011 -0700
> +++ b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18
> 20:52:43 2011 -0700
> @@ -504,7 +504,7 @@
> FD_ZERO(&rfds);
> FD_SET(fd, &rfds);
>
> - tv.tv_sec = 0;
> + tv.tv_sec = 5;
> tv.tv_usec = 500000;
>
> rc = select(fd + 1, &rfds, NULL, NULL, &tv);

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.