[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] some thoughts about merlot{0|1} issues [was: Re: [xen-unstable test] 102522: tolerable FAIL - PUSHED]
On Wed, 2016-11-23 at 15:54 +0000, osstest service owner wrote: > flight 102522 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/102522/ > > Regressions which are regarded as allowable (not blocking): > test-amd64-amd64-xl-rtds 9 debian- > install fail like 102465 > This is on merlot1, and as far as I can tell, this test is failing there since quite a while (is that the correct interpretation of this table?): http://logs.test-lab.xenproject.org/osstest/results/history/test-amd64-amd64-xl-rtds/xen-unstable This is using RTDS as scheduler, but that should not be the problem. In fact, what's failing is xen-create-image timing out. Basically, it starts creating the VM filesystem via debootstrap, but does not manage to finish doing that within the 2530 secs timeout: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/9.ts-debian-install.log 2016-11-23 13:12:35 Z command timed out [2500]: timeout 2530 ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=100 -o ServerAliveInterval=100 -o PasswordAuthentication=no -o ChallengeResponseAuthentication=no -o UserKnownHostsFile=tmp/t.known_hosts_102522.test-amd64-amd64-xl-rtds root@172.16.144.21 http_proxy=http://cache:3143/ \ xen-create-image \ In other runs on different hosts, still under RTDS, that takes about 650 seconds: http://logs.test-lab.xenproject.org/osstest/logs/102532/test-amd64-amd64-xl-rtds/9.ts-debian-install.log And I've tried myself on my test box, and it took 10m20s. We know from here, that, this time, it got stuck rather early: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-xen-tools-debian.guest.osstest.log I've looked at a handful of other instances, and it seems to be _always_ like that. The system appears alive though, or at least right after the timeout, the ts-log-capture phase --which includes issuing commands on the host and copying files from there-- succeeds. Also, not sure it means much, but xen-create-image starts at 12:30:55 and times out at 13:12:35. Looking in: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-daemon.log we see: Nov 23 12:04:55 merlot1 ntpd[3003]: Listening on routing socket on fd #22 for interface updates [..] Nov 23 12:27:22 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 12:34:04 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 12:40:47 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 12:47:31 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 12:54:14 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 13:00:57 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes Nov 23 13:07:40 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes So, again, at least something is alive on the host, and writing in the logs, even during the time that debootstrap seems stuck. There are 2 running vcpus: Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 1 2 r-- 35.7 all / all Domain-0 0 17 9 r-- 38.3 all / all vcpu 17 is running on CPU 9 which is on node 1, which has _0_ memory: node: memsize memfree distances 0: 9216 7856 10,16,16,16 1: 0 0 16,10,16,16 2: 8175 7779 16,16,10,16 3: 0 0 16,16,16,10 (see here: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1-output-xl_info_-n ) but that should not mean much, and in other, still failing, runs, this does not happen (also, this is just what is going on while we collect logs). Now, about serial output: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/serial-merlot1.log When dumping ACPI C states, here's how things look like for _all_ CPUs: Nov 23 13:13:00.382134 (XEN) ==cpu3== Nov 23 13:13:00.382157 (XEN) active state: C-1 Nov 23 13:13:00.390096 (XEN) max_cstate: C7 Nov 23 13:13:00.390125 (XEN) states: Nov 23 13:13:00.390148 (XEN) C1: type[C1] latency[000] usage[00000000] method[ HALT] duration[0] Nov 23 13:13:00.398055 (XEN) C0: usage[00000000] duration[4229118701384] Nov 23 13:13:00.398090 (XEN) PC2[0] PC3[0] PC6[0] PC7[0] Nov 23 13:13:00.406088 (XEN) CC3[0] CC6[0] CC7[0] And I checked other runs, and it's the same everywhere. I remember that Jan suggested trying to pass max_cstate=1 to Xen at boot. I was about to ask Ian to do that for this host, but it looks like we're using only C0 and C1 already anyway. Boot command line looks like this: xen_commandline : placeholder conswitch=x watchdog com1=115200,8n1 console=com1,vga gdb=com1 dom0_mem=512M,max:512M ucode=scan sched=rtds which makes the above look a bit weird to me... But I've played much more with Intel boxes than with AMD ones, I admit. For now, I'm done. At some point, I'll recall either merlot0 or merlot1 out of OSSTest, take it back for myself, and try to investigate more. If, it in the meantime, any of this rings a bell for anyone, feel free to speak up. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |