[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system



On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> This is a first test using Qubes OS CI infra. The gitlab-runner has
> access to ssh-based control interface (control@thor.testnet, ssh key
> exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> files (mapped under /scratch/gitlab-runner/tftp inside the container).
> Details about the setup are described on
> https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> 
> This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> system. The test script is based on the Xilinx one.
> 
> The machine needs newer kernel than other x86 tests run, so use 6.1.x
> kernel added in previous commit.
> 
> When building rootfs, use fakeroot to preserve device files when
> repacking rootfs archives in a non-privileged container.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

This is awesome!! Thanks Marek!


> ---
> I'm bad at naming things. Things that I could use naming suggestions:
>  - test script (qubes-x86-64-suspend.sh) - this might be okay?
>  - test template job name (.adl-x86-64)
>  - test job name (adl-suspend-x86-64-gcc)
>  - tag (qubes-hw2)

I think these names are OK. I would maybe rename the tag "qubes-hw2" to
"qubes" because so far there is only one but I am fine with qubes-hw2
also.


> For context, our CI has several machines, named test-X or hwX, each
> controlled with matching hal900X RPi (this is where gitlab-runner is).
> This test uses only one specific hw, but I plan adding few others too.
> ---
>  automation/gitlab-ci/test.yaml             |  31 ++++-
>  automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
>  2 files changed, 186 insertions(+)
>  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> 
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index 2e1a6886df7f..f5511dd6da70 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -15,6 +15,10 @@
>  .arm32-test-needs: &arm32-test-needs
>    - qemu-system-aarch64-6.0.0-arm32-export
>  
> +.x86-64-test-needs: &x86-64-test-needs
> +  - alpine-3.12-rootfs-export
> +  - kernel-6.1.19-export

As you know, the jobs starting with a "." are template jobs to avoid
repeating the same things over and over. .x86-64-test-needs could be
used in qemu-alpine-x86_64-gcc also.


>  .qemu-arm64:
>    extends: .test-jobs-common
>    variables:
> @@ -84,6 +88,25 @@
>    tags:
>      - xilinx
>  
> +.adl-x86-64:
> +  extends: .test-jobs-common
> +  variables:
> +    # the test controller runs on RPi4
> +    CONTAINER: alpine:3.12-arm64v8
> +    LOGFILE: smoke-test.log
> +  artifacts:
> +    paths:
> +      - smoke.serial
> +      - '*.log'
> +    when: always
> +  only:
> +    variables:
> +      - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"

Let me know which trees should have QUBES_JOBS set to true (thus able to
start Qubes jobs.) At a minimum, I think we would want
https://gitlab.com/xen-project/xen to test "staging" and "master". I can
set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
OK with it.


> +  before_script:
> +    - apk --no-cache add openssh-client fakeroot

This would work but we typically try to avoid installing more packages
in the test jobs for speed and efficiency. Instead, add those packages
directly to the build container dockerfile
(automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:

diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile 
b/automation/build/alpine/3.12-arm64v8.dockerfile
index 180c978964..3f1e6a3fc6 100644
--- a/automation/build/alpine/3.12-arm64v8.dockerfile
+++ b/automation/build/alpine/3.12-arm64v8.dockerfile
@@ -41,3 +41,6 @@ RUN apk --no-cache add \
   libattr \
   libcap-ng-dev \
   pixman-dev \
+  # qubes test deps
+  openssh-client \
+  fakeroot \

But I am not sure why you need fakeroot, more questions below about it


> +  tags:
> +    - qubes-hw2
> +
>  # Test jobs
>  build-each-commit-gcc:
>    extends: .test-jobs-common
> @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
>      - *arm64-test-needs
>      - alpine-3.12-gcc-arm64
>  
> +adl-suspend-x86-64-gcc:
> +  extends: .adl-x86-64
> +  script:
> +    - ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE}
> +  needs:
> +    - *x86-64-test-needs
> +    - alpine-3.12-gcc

This up to you, but qubes-x86-64-suspend.sh is very cool and
sophisticated and I would use it to create at least 2 jobs:
- a plain dom0 + domU boot (no suspend)
- a suspend/resume job

I am happy either way


>  qemu-smoke-dom0-arm64-gcc:
>    extends: .qemu-arm64
>    script:
> diff --git a/automation/scripts/qubes-x86-64-suspend.sh 
> b/automation/scripts/qubes-x86-64-suspend.sh
> new file mode 100755
> index 000000000000..fc479c9faaec
> --- /dev/null
> +++ b/automation/scripts/qubes-x86-64-suspend.sh
> @@ -0,0 +1,155 @@
> +#!/bin/sh
> +
> +set -ex
> +
> +test_variant=$1
> +
> +wait_and_wakeup=
> +if [ -z "${test_variant}" ]; then
> +    passed="ping test passed"
> +    domU_check="
> +until ifconfig eth0 192.168.0.2 &> /dev/null && ping -c 10 192.168.0.1; do
> +    sleep 30
> +done
> +echo \"${passed}\"
> +"
> +elif [ "${test_variant}" = "s3" ]; then
> +    passed="suspend test passed"
> +    wait_and_wakeup="started, suspending"
> +    domU_check="
> +ifconfig eth0 192.168.0.2
> +echo domU started
> +"
> +    dom0_check="
> +until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> +    sleep 1
> +done
> +echo \"${wait_and_wakeup}\"
> +set -x
> +echo deep > /sys/power/mem_sleep
> +echo mem > /sys/power/state
> +# now wait for resume
> +sleep 5
> +# get domU console content into test log
> +tail -n 100 /var/log/xen/console/guest-domU.log
> +xl list
> +xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
> +# check if domU is still alive
> +ping -c 10 192.168.0.2 || exit 1
> +echo \"${passed}\"
> +"
> +fi

Very nice!


> +# DomU
> +mkdir -p rootfs
> +cd rootfs
> +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz

I am a bit confused about it: are you sure you need fakeroot for this?
This script is running inside a container as root already? Are you using
Docker on the RPi4 to run this job?


> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +echo "#!/bin/sh
> +
> +${domU_check}
> +/bin/sh" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +echo "rc_verbose=yes" >> etc/rc.conf
> +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > 
> ../binaries/domU-rootfs.cpio.gz

same here


> +cd ..
> +rm -rf rootfs
> +
> +# DOM0 rootfs
> +mkdir -p rootfs
> +cd rootfs
> +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> +mkdir boot
> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +cp -ar ../binaries/dist/install/* .
> +
> +echo "#!/bin/bash
> +
> +export LD_LIBRARY_PATH=/usr/local/lib
> +bash /etc/init.d/xencommons start
> +
> +brctl addbr xenbr0
> +brctl addif xenbr0 eth0
> +ifconfig eth0 up
> +ifconfig xenbr0 up
> +ifconfig xenbr0 192.168.0.1
> +
> +xl create /etc/xen/domU.cfg
> +${dom0_check}
> +" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +# just PVH for now
> +echo '
> +type = "pvh"
> +name = "domU"
> +kernel = "/boot/vmlinuz"
> +ramdisk = "/boot/initrd-domU"
> +extra = "root=/dev/ram0 console=hvc0"
> +memory = 512
> +vif = [ "bridge=xenbr0", ]
> +disk = [ ]
> +' > etc/xen/domU.cfg
> +
> +echo "rc_verbose=yes" >> etc/rc.conf
> +echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
> +echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
> +mkdir -p var/log/xen/console
> +cp ../binaries/bzImage boot/vmlinuz
> +cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
> +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > 
> ../binaries/dom0-rootfs.cpio.gz
> +cd ..
> +
> +
> +TFTP=/scratch/gitlab-runner/tftp
> +
> +echo '
> +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
> guest_loglvl=all
> +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
> +module2 (http)/gitlab-ci/initrd-dom0
> +' > $TFTP/grub.cfg
> +
> +cp -f binaries/xen $TFTP/xen
> +cp -f binaries/bzImage $TFTP/vmlinuz
> +cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
> +
> +# start logging the serial; this gives interactive console, don't close its
> +# stdin to not close it; the 'cat' is important, plain redirection would hang
> +# until somebody opens the pipe; opening and closing the pipe is used to 
> close
> +# the console
> +mkfifo /tmp/console-stdin
> +cat /tmp/console-stdin |\
> +ssh control@thor.testnet console | tee smoke.serial &

For clarity I would add a variable just below TFTP above to store the
controller node. This will make it easier to change in the future and
also makes it easier to see that it is runner-specific detail, e.g.:

TFTP=/scratch/gitlab-runner/tftp
CONTROLLER=control@thor.testnet

then here:

ssh $CONTROLLER console | tee smoke.serial &


> +# start the system pointing at gitlab-ci predefined config
> +ssh control@thor.testnet gitlabci poweron
> +trap "ssh control@thor.testnet poweroff; : > /tmp/console-stdin" EXIT
> +
> +if [ -n "$wait_and_wakeup" ]; then
> +    # wait for suspend or a timeout
> +    timeout=120
> +    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
> +        sleep 1;
> +        : $((--timeout))
> +    done
> +    if [ $timeout -le 0 ]; then
> +        echo "ERROR: suspend timeout, aborting"
> +        exit 1
> +    fi
> +    # keep it suspended a bit, then wakeup
> +    sleep 30
> +    ssh control@thor.testnet wake
> +fi

I like this. It will shorten the duration of the test job. I am not
asking you to do anything, but I think it would be a good idea to have
all test scripts loop around like this if possible to poll for success
every second rather than wait a pre-determined amount of time, which is
usually several minutes longer than necessary.


> +# give the test time to run a bit, and then check
> +sleep 30
> +
> +(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" 
> smoke.serial) || exit 1
> +exit 0
> -- 
> git-series 0.9.1
> 

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.