[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p



> >
> > Hi,
> >
> > > I've had a few occasions where tapdisk has segfaulted:
> > >
> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > > tapdisk:9180 blocked for more than 120 seconds.
> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> > >
> > > and then like:
> > >
> > > end_request: I/O error, dev tdc, sector 472008
> > >
> > > I can't be sure but I suspect that when this happened either one OSD was
> > > offline, or the cluster lost quorum briefly.
> >
> > Interesting. There might be an issue if a request ends in error, I'll
> > have to check that.
> > I'll have a look on monday.
> >
> 
> You say in tdrbd_finish_aiocb:
> 
>         while (1) {
>                 /* POSIX says write will be atomic or blocking */
>                 rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));
> 
> but from what I've read in "man 7 pipe", the statement about being atomic
> only applies if the pipe is open in non-blocking mode, and you open it with a
> call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This
> would be consistent with the random crashes I'm seeing - I thought they
> were related to transient errors but my ceph cluster has been perfectly
> stable for a few days now and it's still happening.
> 
> What do you think?
> 

Actually maybe not. What I was reading only applies for large number of bytes 
written to the pipe, and even then I got confused by the double negatives. 
Sorry for the noise.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.