[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Re: Xen, LVM, DRBD, Linux-HA



I am going to reply to this thread but I'm going to start from something new as it doesn't seem to be covered in this thread so far.

I have been testing drbd under Xen and found some very disturbing things.

I'd like to implement this in a production system but this scares the hell out of me...

I have two Dom0 servers connected with a crossover cable between two gigabit e1000 NICs. No switch involved.

One DomU on each server with a 20G drbd device shared between them.


The drbd config contains:

  syncer {
    rate 10M;
    group 1;
    al-extents 257;
  }

  net {
    on-disconnect reconnect;
  }


so the net section is working at defaults. At first I had thought that the problems I was seeing was due to timeout values etc and tried various parameters in the net section but nothing made any difference.

When, on the current secondary node, I execute

drbdadm invalidate all

I get frequent errors such as:

drbd0: PingAck did not arrive in time.
drbd0: drbd0_asender [1572]: cstate SyncSource --> NetworkFailure
drbd0: asender terminated
drbd0: drbd_send_block() failed
drbd0: drbd0_receiver [1562]: cstate NetworkFailure --> BrokenPipe
drbd0: short read expecting header on sock: r=-512
drbd0: worker terminated
drbd0: ASSERT( mdev->ee_in_use == 0 ) in /usr/src/modules/drbd/drbd/drbd_receiver.c:1880
drbd0: drbd0_receiver [1562]: cstate BrokenPipe --> Unconnected
drbd0: Connection lost.


I observe the xm top in both Dom0's and I note a HUGE amount of dropped RX packets being reported on both DomU's vif interfaces. The dropping of RX packets is continuous throughout the drbd resync and grows extremely large.

The ifconfig output within the DomU's do not show any dropped packets.

I have used iperf to test the performance of the crossover link and it is fine when there is no drbd syncing going on.

I have tried various things such as setting sysctl.conf options:

net.core.rmem_default=65536
net.core.wmem_default=65536
net.core.rmem_max=16777216
net.core.wmem_max=16777216

net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

but so far the only thing that prevents the "PingAck did not arrive in time" errors is to take the sync rate down to 1M.



My Xen version info is:

Xen version 3.0.3-1 (Debian 3.0.3-0-4)


Please advise...

Thanks!

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.