[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 142973: regressions - FAIL



On 21.10.19 13:06, Ian Jackson wrote:
Jürgen Groß writes ("Re: [Xen-devel] [xen-unstable test] 142973: regressions - 
FAIL"):
On 21.10.19 10:23, osstest service owner wrote:
flight 142973 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/142973/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-amd64-amd64-xl-pvshim   18 guest-localmigrate/x10   fail REGR. vs. 
142750

Roger, I believe you have looked into that one?

I guess the conversation via IRC with Ian regarding the race between
blkback and OSStest was related to the issue?

I think this failure is something else.

What happens here is this:

2019-10-21 02:58:32 Z executing ssh ... -v root@172.16.145.205 date
[bounch of output from ssh]
status (timed out) at Osstest/TestSupport.pm line 550.
2019-10-21 02:58:42 Z exit status 4

172.16.145.205 is the guest here.  Ie, `ssh date guest' took longer
than 10s.

We can see that the guest networking is working soon after the
migration because we got most of the way through the ssh protocol
exchange.  On the previous repetition the next message from ssh was
    debug1: SSH2_MSG_SERVICE_ACCEPT received

Looking at
   
http://logs.test-lab.xenproject.org/osstest/logs/142973/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
which is, I think, the log of the "new" instance of guest, after
migration, there are messages about killing various services.  Eg
   [1918064738.820550] systemd[1]: systemd-udevd.service: Main process
   exited, code=killed, status=6/ABRT
They don't seem to be normal.  For example:
   
http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
is the previous xen-unstable flight and it doesn't have them.  I
looked in
   
http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest.log.gz
too and that has some alarming messages from the kernel like
  [  686.692660] rcu_sched kthread starved for 1918092123128 jiffies!
  g18446744073709551359 c18446744073709551358 f0x0 RCU_GP_WAIT_FQS(3)
  ->state=0x0 ->cpu=0
and accompanying stack traces.  But the test passed there.  I think
that is probably something else ?

This seems to be the issue Sergey is seeing, too.


ABRT suggests guest memory corruption.

Sure? I'd think of an abort() call.


If this is the case, could you, Ian, please add the workaround you were
thinking of to OSStest (unconditional by now, maybe make it condtitional
later)?

I can add the block race workaround but I don't think it will help
with migration anyway.  The case where things go wrong is destroy.

Okay, no hurry then.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.