[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v3 3/7] CI: switch qubes runners to use console.exp



It appears as sometimes it takes more time for Xen even start booting,
mostly due to firmware and fetching large boot files by grub. In some
jobs the current timeout is pretty close to the actual time needed, and
sometimes (rarely for now) test fails due to timeout expiring in the
middle of dom0 booting. This will be happening more often if the
initramfs will grow (and with more complex tests).
This has been observed on some dom0pvh-hvm jobs, at least on runners hw3
and hw11.

Switch to using expect (console.exp) for more robust test output
handling. This allows waiting separately for Xen starting to boot and
then for the test to complete. For now, set both of those to 120s, which
pessimistically bumps timeout for the whole test to 240s (from 120s).

Add S3 handling to console.exp via SUSPEND_MSG + WAKEUP_CMD.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
---
Changes in v3:
- split into two patches - generic change is in the previous one
Changes in v2:
- replace previous "ci: increase timeout for hw tests" with changing how
  console is interacted with

This needs a containers rebuild.
---
 automation/build/alpine/3.18-arm64v8.dockerfile |  1 +-
 automation/scripts/console.exp                  | 13 +++++-
 automation/scripts/qubes-x86-64.sh              | 52 ++++--------------
 3 files changed, 27 insertions(+), 39 deletions(-)

diff --git a/automation/build/alpine/3.18-arm64v8.dockerfile 
b/automation/build/alpine/3.18-arm64v8.dockerfile
index 19fe46f8418f..b8482d5bf43f 100644
--- a/automation/build/alpine/3.18-arm64v8.dockerfile
+++ b/automation/build/alpine/3.18-arm64v8.dockerfile
@@ -48,3 +48,4 @@ RUN apk --no-cache add \
   # qubes test deps
   openssh-client \
   fakeroot \
+  expect \
diff --git a/automation/scripts/console.exp b/automation/scripts/console.exp
index 834a08db1b95..bdb1dd982003 100755
--- a/automation/scripts/console.exp
+++ b/automation/scripts/console.exp
@@ -9,6 +9,10 @@
 #   tests that's a command to read serial console
 # - UBOOT_CMD (optional): command to enter at u-boot prompt
 # - BOOT_MSG (optional): initial Xen message to wait for (aka sign-of-life)
+# - SUSPEND_MSG (optional): message signaling system is going to sleep, it's
+#   trigger for WAKEUP_CMD (see below)
+# - WAKEUP_CMD (optional): command to execute to wakeup the system 30s after
+#   seeing SUSPEND_MSG
 # - LOG_MSG (optional): final console message to wait for
 # - PASSED: message to look for to consider test a success; if LOG_MSG is set,
 #   both LOG_MSG and PASSED must appear (in any order) for test to succeed
@@ -45,6 +49,15 @@ if {[info exists env(BOOT_MSG)]} {
     expect -re "$env(BOOT_MSG)"
 }
 
+if {[info exists env(WAKEUP_CMD)]} {
+    expect -re "$env(SUSPEND_MSG)"
+
+    # keep it suspended a bit, then wakeup
+    sleep 30
+
+    system "$env(WAKEUP_CMD)"
+}
+
 if {[info exists env(LOG_MSG)]} {
     expect {
         -re "$env(PASSED)" {
diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index a964ac4b7a4e..861e302d845b 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-set -ex
+set -ex -o pipefail
 
 # One of:
 #  - ""             PV dom0,  PVH domU
@@ -267,52 +267,26 @@ cp -f binaries/xen $TFTP/xen
 cp -f binaries/bzImage $TFTP/vmlinuz
 cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
 
-# start logging the serial; this gives interactive console, don't close its
-# stdin to not close it; the 'cat' is important, plain redirection would hang
-# until somebody opens the pipe; opening and closing the pipe is used to close
-# the console
-mkfifo /tmp/console-stdin
-cat /tmp/console-stdin |\
-ssh $CONTROLLER console | tee smoke.serial | sed 's/\r//' &
-
 # start the system pointing at gitlab-ci predefined config
 ssh $CONTROLLER gitlabci poweron
-trap "ssh $CONTROLLER poweroff; : > /tmp/console-stdin" EXIT
+trap "ssh $CONTROLLER poweroff" EXIT
 
 if [ -n "$wait_and_wakeup" ]; then
-    # wait for suspend or a timeout
-    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
-        sleep 1;
-        : $((--timeout))
-    done
-    if [ $timeout -le 0 ]; then
-        echo "ERROR: suspend timeout, aborting"
-        exit 1
-    fi
-    # keep it suspended a bit, then wakeup
-    sleep 30
-    ssh $CONTROLLER wake
+    export SUSPEND_MSG="$wait_and_wakeup"
+    export WAKEUP_CMD="ssh $CONTROLLER wake"
 fi
 
-set +x
-until grep "^Welcome to Alpine Linux" smoke.serial || [ $timeout -le 0 ]; do
-    sleep 1;
-    : $((--timeout))
-done
-set -x
-
-tail -n 100 smoke.serial
-
-if [ $timeout -le 0 ]; then
-    echo "ERROR: test timeout, aborting"
-    exit 1
-fi
+export PASSED="${passed}"
+export BOOT_MSG="Latest ChangeSet: "
+export LOG_MSG="\nWelcome to Alpine Linux"
+export TEST_CMD="ssh $CONTROLLER console"
+export TEST_LOG="smoke.serial"
+export TEST_TIMEOUT="$timeout"
+./automation/scripts/console.exp | sed 's/\r\+$//'
+TEST_RESULT=$?
 
 if [ -n "$retrieve_xml" ]; then
     nc -w 10 "$SUT_ADDR" 8080 > tests-junit.xml </dev/null
 fi
 
-sleep 1
-
-(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" 
smoke.serial) || exit 1
-exit 0
+exit "$TEST_RESULT"
-- 
git-series 0.9.1



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.