[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] RE: Concurrent local write error in DRBD with GPL PVdrivers


  • To: "Philippe Lang" <philippe.lang@xxxxxxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>
  • From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
  • Date: Wed, 12 May 2010 16:37:29 +1000
  • Cc:
  • Delivery-date: Tue, 11 May 2010 23:39:21 -0700
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: Acrtv9wwoN00PHR+Sui7TUIHmxXzzAD1sV4gAAFVkEA=
  • Thread-topic: [Xen-users] RE: Concurrent local write error in DRBD with GPL PVdrivers

> Hi,
> 
> There's was no answer to my post of last week. Does anyone know how to contact
> directly the maintainers of the GPL PV driver maybe?

That would be me. Sorry I didn't see your original email.

> 
> Before starting using our new servers in production, we'd like to know if the
> drbd error mentioned is not problem at all, or if we have better wait until
> this case is being investigated.
> 
> One more precision: we use Debian Lenny for the dom0 OS.
> 

I have a similar configuration (dual-primary drbd) and have seen similar drbd 
messages but have never investigated. I have never had any filesystem 
corruption even under high usage but that doesn't mean it isn't possible. After 
reading the stuff you posted and linked to it looks like something that is 
worth investigating.

Can you tell me more about your non-PV setup? Are you still using phy:?

One difference between PV and non-PV drivers is that GPLPV uses the scsiport 
interface which does allow multiple outstanding requests. When not using PV 
drivers, windows uses the qemu emulated IDE drivers which probably don't allow 
more than one outstanding request, so this situation could never arise.

What I need to know is if Windows is giving me requests in a broken way, or if 
GPLPV is handling them in a broken way... I'll post on the ntdev mailing list 
and see if someone there knows.

After accumulating a bunch of these errors in your test environment, can you do 
a chkdsk and see what comes up?

Thanks

James


> Best regards,
> 
> -------------------------------------------------------------
> Attik System              web  : http://www.attiksystem.ch
> Philippe Lang             phone: +41 26 422 13 75
> rte de la Fonderie 2      gsm  : +41 79 351 49 94
> 1700 Fribourg             pgp  : http://keyserver.pgp.com
> 
> 
> > -----Message d'origine-----
> > DeÂ: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> > bounces@xxxxxxxxxxxxxxxxxxx] De la part de Philippe Lang
> > EnvoyÃÂ: vendredi 7 mai 2010 10:33
> > ÃÂ: xen-users@xxxxxxxxxxxxxxxxxxx
> > ObjetÂ: [Xen-users] Concurrent local write error in DRBD with GPL PV
> > drivers
> >
> > Hi,
> >
> > We are about to start using in production Windows 2003 64bits XEN VMs
> > running on top of a DRBD cluster. We have installed the latest GPL PV
> > optimized drivers. (gplpv_2003x64_0.11.0.213.msi)
> >
> > Sometimes, when the VM writes on disk, we can see the following error
> > in the DRBD log:
> >
> > May  6 17:45:48 s3 kernel: [198887.626841] drbd0: blkback.35.hda[3549]
> > Concurrent local write detected! [DISCARD L] new: 26745079s +4096;
> > pending: 26745079s +4096
> >
> > This error is discussed in the following thread of the DRBD mailing
> > list:
> >
> > http://lists.linbit.com/pipermail/drbd-user/2009-April/011873.html
> >
> > Basically, it is due to the fact that a write occurs at a specific
> > location, while another "in-flight" write is taking place at the same
> > location. In order to avoid a cluster desynchronization, DRBD drops the
> > second write.
> >
> > We were able to reproduce this problem, with the help of a Windows
> > program called "PerformanceTest" from Passmark Software. When doing a
> > "Disk Random Seek +RW" test, the logs gets filled with the error
> > mentioned at the top of this message.
> >
> > We have tested a VM *without* the gplpv drivers, and *no error*
> > appears. We have tested previous drivers versions (0.11.0.188,
> > 0.10.0.142), and the same error appears.
> >
> > So, we have the feeling there is some kind of error in the driver,
> > although we have never experienced a single VM crash. Can we safely
> > ignore the "Concurrent local write error" mentioned in the log, or is
> > that really a bug that should be corrected before using the driver in
> > production?
> >
> > Best regards,
> >
> > -------------------------------------------------------------
> > Attik System              web  : http://www.attiksystem.ch
> > Philippe Lang             phone: +41 26 422 13 75
> > rte de la Fonderie 2      gsm  : +41 79 351 49 94
> > 1700 Fribourg             pgp  : http://keyserver.pgp.com
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.