[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [linux-linus test] 25478: regressions - FAIL



On Fri, 2014-03-14 at 14:23 -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 14, 2014 at 05:07:35PM +0000, Ian Campbell wrote:
> > On Fri, 2014-03-14 at 16:42 +0000, xen.org wrote:
> > > flight 25478 linux-linus real [real]
> > > http://www.chiark.greenend.org.uk/~xensrcts/logs/25478/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 
> > > 12557
> > 
> > Is anyone looking at these? Apparently this hasn't passed for 23 months:
> 
> I believe I asked to tweak somethings 23 months ago to troubleshoot this
> but never heard back from you.

Was that me? I didn't think I had touched osstest at all that long ago
apart from occasionally pinging folks when things looked to be failing.
In any case sorry for letting it all through the cracks. Can you
remember what the tweaks were? (I'm a bit reluctant to play "tweak the
test case until it passes", but lets see what they are first).

What's weird is that the linux-3.4 and linux-3.10 stable branch flights
seem to be passing at least some of the time, although looking at the
history there they do seem to be hitting a failure in the same test
cases at least sometimes.

> > http://xenbits.xen.org/gitweb/?p=linux-pvops.git;a=shortlog;h=refs/heads/tested/linux-linus
> > 
> > Looking through the recent failures this migration one seems quite
> > common but there seem to be a few others, search for "[linux-linux
> > test]" in http://lists.xen.org/archives/html/xen-devel/2014-03/ for some
> > examples.
> > 
> > The particular failure here is
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/25478/test-amd64-i386-pair/info.html
> >  and the console logs 
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/25478/test-amd64-i386-pair/serial-gall-mite.log
> >   are full of
> 
> The 'info.html' you alluded - I just saw that happening with Xen 4.5, but
> I don't see the SWIOTLB issue in my dom0.
> 
> >         Mar 14 13:24:27.592641 [ 1010.742462] mptsas 0000:03:00.0: swiotlb 
> > buffer is full
> > 
> > The migration itself times out after 5 minutes or so (for a 512M guest)
> 
> The more recent Linux kernel

This should be the most recent kernel, it's testing Linus' master. Here
it is v3.14-rc6+ at changeset c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4.

FWIW this happened again in 25558 over the weekend and
http://www.chiark.greenend.org.uk/~xensrcts/logs/25558/test-amd64-i386-pair/serial-gall-mite.log
 shows some different messages along with the ones quoted above:
        Mar 16 12:51:30.074927 [  845.982443] swiotlb_tbl_map_single: 269 
callbacks suppressed
        [...]
        Mar 16 12:51:30.099339 [  845.982464] mptsas 0000:03:00.0: swiotlb 
buffer is full (sz: 4096 bytes)
AFAICS all of the latter are 4k sized.

I had to go back to
http://www.chiark.greenend.org.uk/~xensrcts/logs/25558/test-amd64-i386-pair/serial-gall-mite.log.0
 to find the first such messages, there I see some bnx2 related ones as well 
e.g.
        Mar 16 12:46:48.315302 [  579.735844] bnx2 0000:02:00.0: swiotlb buffer 
is full (sz: 21024 bytes)

These all appear to start only after the guest is launched, but there is
no big smoking splat around that time like I was hoping for (i.e. around
"Mar 16 12:46:20.837409 (d1)").

The actual dom0 boot looks ok to me. FWIW:
        Mar 16 12:37:31.756322 [    0.000000] software IO TLB [mem 
0x1ba93000-0x1fa93000] (64MB) mapped at [dba93000-dfa92fff]

Interesting that the issues seem to happen on the 64-bit side (this test
is a migration from 64- to 32-bit host). It's also strange that
test-amd64-amd64-xl and other amd64 based tests don't seem to be
affected.

>  will also tell you what type of requests it was.

Do you mean the size? It seems to print that only for certain requests.

> You might also want to try a larger SWIOTLB buffer, swiotlb=26422 for fun.

Any reason for that particular number?

> I think you are looking at two different issues.

You mean you think the swiotlb issue is unrelated to the slow migration
timeouts? I can believe that.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.