[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] some thoughts about merlot{0|1} issues [was: Re: [xen-unstable test] 102522: tolerable FAIL - PUSHED]



On Wed, 2016-11-23 at 15:54 +0000, osstest service owner wrote:
> flight 102522 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/102522/
> 
> Regressions which are regarded as allowable (not blocking):
>  test-amd64-amd64-xl-rtds      9 debian-
> install               fail  like 102465
> 
This is on merlot1, and as far as I can tell, this test is failing
there since quite a while (is that the correct interpretation of this
table?):
http://logs.test-lab.xenproject.org/osstest/results/history/test-amd64-amd64-xl-rtds/xen-unstable

This is using RTDS as scheduler, but that should not be the problem. In
fact, what's failing is xen-create-image timing out.

Basically, it starts creating the VM filesystem via debootstrap, but
does not manage to finish doing that within the 2530 secs timeout:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/9.ts-debian-install.log
2016-11-23 13:12:35 Z command timed out [2500]: timeout 2530 ssh -o 
StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=100 -o 
ServerAliveInterval=100 -o PasswordAuthentication=no -o 
ChallengeResponseAuthentication=no -o 
UserKnownHostsFile=tmp/t.known_hosts_102522.test-amd64-amd64-xl-rtds 
root@172.16.144.21         http_proxy=http://cache:3143/ \
        xen-create-image \

In other runs on different hosts, still under RTDS, that takes about
650 seconds:
http://logs.test-lab.xenproject.org/osstest/logs/102532/test-amd64-amd64-xl-rtds/9.ts-debian-install.log

And I've tried myself on my test box, and it took 10m20s.

We know from here, that, this time, it got stuck rather early:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-xen-tools-debian.guest.osstest.log

I've looked at a handful of other instances, and it seems to be
_always_ like that.

The system appears alive though, or at least right after the timeout,
the ts-log-capture phase --which includes issuing commands on the host
and copying files from there-- succeeds.

Also, not sure it means much, but xen-create-image starts at
12:30:55 and times out at 13:12:35. Looking in:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-daemon.log
we see:
Nov 23 12:04:55 merlot1 ntpd[3003]: Listening on routing socket on fd #22 for 
interface updates
[..]
Nov 23 12:27:22 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 12:34:04 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 12:40:47 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 12:47:31 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 12:54:14 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 13:00:57 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes
Nov 23 13:07:40 merlot1 init: Id "T0" respawning too fast: disabled for 5 
minutes

So, again, at least something is alive on the host, and writing in the
logs, even during the time that debootstrap seems stuck.

There are 2 running vcpus:
Name                                ID  VCPU   CPU State   Time(s) Affinity 
(Hard / Soft)
Domain-0                             0     1    2   r--      35.7  all / all
Domain-0                             0    17    9   r--      38.3  all / all

vcpu 17 is running on CPU 9 which is on node 1, which has _0_ memory:
node:    memsize    memfree    distances
   0:      9216       7856      10,16,16,16
   1:         0          0      16,10,16,16
   2:      8175       7779      16,16,10,16
   3:         0          0      16,16,16,10

(see here: 
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1-output-xl_info_-n
 )

but that should not mean much, and in other, still failing, runs, this
does not happen (also, this is just what is going on while we collect
logs).

Now, about serial output:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/serial-merlot1.log

When dumping ACPI C states, here's how things look like for _all_ CPUs:
Nov 23 13:13:00.382134 (XEN) ==cpu3==
Nov 23 13:13:00.382157 (XEN) active state:              C-1
Nov 23 13:13:00.390096 (XEN) max_cstate:                C7
Nov 23 13:13:00.390125 (XEN) states:
Nov 23 13:13:00.390148 (XEN)     C1:    type[C1] latency[000] usage[00000000] 
method[ HALT] duration[0]
Nov 23 13:13:00.398055 (XEN)     C0:    usage[00000000] duration[4229118701384]
Nov 23 13:13:00.398090 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
Nov 23 13:13:00.406088 (XEN) CC3[0] CC6[0] CC7[0]

And I checked other runs, and it's the same everywhere.

I remember that Jan suggested trying to pass max_cstate=1 to Xen at
boot. I was about to ask Ian to do that for this host, but it looks
like we're using only C0 and C1 already anyway.

Boot command line looks like this:
xen_commandline        : placeholder conswitch=x watchdog com1=115200,8n1 
console=com1,vga gdb=com1 dom0_mem=512M,max:512M ucode=scan sched=rtds

which makes the above look a bit weird to me... But I've played much
more with Intel boxes than with AMD ones, I admit.


For now, I'm done. At some point, I'll recall either merlot0 or merlot1
out of OSSTest, take it back for myself, and try to investigate more.
If, it in the meantime, any of this rings a bell for anyone, feel free
to speak up.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.