[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system



On Sat, 18 Mar 2023, Marek Marczykowski-Górecki wrote:
> On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > > This is a first test using Qubes OS CI infra. The gitlab-runner has
> > > access to ssh-based control interface (control@thor.testnet, ssh key
> > > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> > > files (mapped under /scratch/gitlab-runner/tftp inside the container).
> > > Details about the setup are described on
> > > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> > > 
> > > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> > > system. The test script is based on the Xilinx one.
> > > 
> > > The machine needs newer kernel than other x86 tests run, so use 6.1.x
> > > kernel added in previous commit.
> > > 
> > > When building rootfs, use fakeroot to preserve device files when
> > > repacking rootfs archives in a non-privileged container.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki 
> > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > 
> > This is awesome!! Thanks Marek!
> > 
> > 
> > > ---
> > > I'm bad at naming things. Things that I could use naming suggestions:
> > >  - test script (qubes-x86-64-suspend.sh) - this might be okay?
> > >  - test template job name (.adl-x86-64)
> > >  - test job name (adl-suspend-x86-64-gcc)
> > >  - tag (qubes-hw2)
> > 
> > I think these names are OK. I would maybe rename the tag "qubes-hw2" to
> > "qubes" because so far there is only one but I am fine with qubes-hw2
> > also.
> 
> I have 10 of them (and growing), so I'd like to keep tag name at least
> somehow referencing which runner it uses. For example, this one is
> a desktop with Alder Lake, but some other tests I may want to use a laptop
> with Tiger Lake, for example. The mapping name -> hw spec isn't publicly
> written down (although our openQA publishes all kind of logs from them,
> so it's possible to infer this info).

That's fine by me, use whatever naming scheme works for you :-)


> > > For context, our CI has several machines, named test-X or hwX, each
> > > controlled with matching hal900X RPi (this is where gitlab-runner is).
> > > This test uses only one specific hw, but I plan adding few others too.
> > > ---
> > >  automation/gitlab-ci/test.yaml             |  31 ++++-
> > >  automation/scripts/qubes-x86-64-suspend.sh | 155 ++++++++++++++++++++++-
> > >  2 files changed, 186 insertions(+)
> > >  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> > > 
> > > diff --git a/automation/gitlab-ci/test.yaml 
> > > b/automation/gitlab-ci/test.yaml
> > > index 2e1a6886df7f..f5511dd6da70 100644
> > > --- a/automation/gitlab-ci/test.yaml
> > > +++ b/automation/gitlab-ci/test.yaml
> > > @@ -15,6 +15,10 @@
> > >  .arm32-test-needs: &arm32-test-needs
> > >    - qemu-system-aarch64-6.0.0-arm32-export
> > >  
> > > +.x86-64-test-needs: &x86-64-test-needs
> > > +  - alpine-3.12-rootfs-export
> > > +  - kernel-6.1.19-export
> > 
> > As you know, the jobs starting with a "." are template jobs to avoid
> > repeating the same things over and over. .x86-64-test-needs could be
> > used in qemu-alpine-x86_64-gcc also.
> 
> Right, when switching all of them to 6.1 kernel, that makes sense.
> 
> > >  .qemu-arm64:
> > >    extends: .test-jobs-common
> > >    variables:
> > > @@ -84,6 +88,25 @@
> > >    tags:
> > >      - xilinx
> > >  
> > > +.adl-x86-64:
> > > +  extends: .test-jobs-common
> > > +  variables:
> > > +    # the test controller runs on RPi4
> > > +    CONTAINER: alpine:3.12-arm64v8
> > > +    LOGFILE: smoke-test.log
> > > +  artifacts:
> > > +    paths:
> > > +      - smoke.serial
> > > +      - '*.log'
> > > +    when: always
> > > +  only:
> > > +    variables:
> > > +      - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
> > 
> > Let me know which trees should have QUBES_JOBS set to true (thus able to
> > start Qubes jobs.) At a minimum, I think we would want
> > https://gitlab.com/xen-project/xen to test "staging" and "master". I can
> > set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
> > OK with it.
> 
> Yes, that's perfectly okay. I'd like at least also staging-4.17, but
> depending on push frequency other staging-* are probably fine too (I
> very much doubt long queue would be an issue). Of course that assumes
> those tests would be backported, which I'm not sure if is planned. I'm
> also okay with allowing committers and/or other maintainers to use it on
> demand, but preferably not all patchew branches.

Yes that makes sense. Let's start with
https://gitlab.com/xen-project/xen for now, then we'll figure out a way
to implement more precisely what you wrote above, which also matches the
requirements for the xilinx runner.


> BTW, if you trigger job manually via a web interface or API, you can
> specify variables for just a single pipeline. And if you use that
> feature, you can also make gitlab present you a bit more convenient
> interface with variables listed already. See example here:
> https://gitlab.com/QubesOS/qubes-continuous-integration/-/blob/main/.gitlab-ci.yml#L15-26
> This could be useful to start only a subset of tests.

Good to know!


> > > +  before_script:
> > > +    - apk --no-cache add openssh-client fakeroot
> > 
> > This would work but we typically try to avoid installing more packages
> > in the test jobs for speed and efficiency. 
> 
> Yes, I was being lazy and wanted to avoid rebuilding container (and
> changing from where it's pulled) when developing the test. Sure, will
> fold that into the dockerfile.

Thanks!


> > Instead, add those packages
> > directly to the build container dockerfile
> > (automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:
> > 
> > diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile 
> > b/automation/build/alpine/3.12-arm64v8.dockerfile
> > index 180c978964..3f1e6a3fc6 100644
> > --- a/automation/build/alpine/3.12-arm64v8.dockerfile
> > +++ b/automation/build/alpine/3.12-arm64v8.dockerfile
> > @@ -41,3 +41,6 @@ RUN apk --no-cache add \
> >    libattr \
> >    libcap-ng-dev \
> >    pixman-dev \
> > +  # qubes test deps
> > +  openssh-client \
> > +  fakeroot \
> > 
> > But I am not sure why you need fakeroot, more questions below about it
> > 
> > 
> > > +  tags:
> > > +    - qubes-hw2
> > > +
> > >  # Test jobs
> > >  build-each-commit-gcc:
> > >    extends: .test-jobs-common
> > > @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
> > >      - *arm64-test-needs
> > >      - alpine-3.12-gcc-arm64
> > >  
> > > +adl-suspend-x86-64-gcc:
> > > +  extends: .adl-x86-64
> > > +  script:
> > > +    - ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee 
> > > ${LOGFILE}
> > > +  needs:
> > > +    - *x86-64-test-needs
> > > +    - alpine-3.12-gcc
> > 
> > This up to you, but qubes-x86-64-suspend.sh is very cool and
> > sophisticated and I would use it to create at least 2 jobs:
> > - a plain dom0 + domU boot (no suspend)
> > - a suspend/resume job
> 
> Indeed both tests are implemented there already, just a matter of
> creating another job with a different parameter.
> 
> But then, the script name isn't really accurate. qubes-x86-64.sh then?

Yes good idea to rename the script


> FWIW, next tests I'm planning to contribute:
>  - PCI passthrough to HVM with qemu in dom0
>  - linux stubdomain
>  - PCI passthrough with linux stubdomain
> 
> And possibly some of the above exercising other systems, maybe something
> on AMD (currently only one runner is on AMD, Ryzen 5 4500U specifically,
> and is rather busy because of that, but we can try).

That's very cool! I am looking forward to it.


> I'm not sure yet if putting them all in a single script is a smart idea.
> On the other hand, quite a bit of boilerplate would be duplicated
> otherwise.

There is no hard rule on this. I'd say try to put them on a single
script because it tends to help with maintainability. But if one of the
tests is sufficiently different and would make things complicated is
also OK to move it to a new script.


> > I am happy either way
> > 
> > 
> > >  qemu-smoke-dom0-arm64-gcc:
> > >    extends: .qemu-arm64
> > >    script:
> > > diff --git a/automation/scripts/qubes-x86-64-suspend.sh 
> > > b/automation/scripts/qubes-x86-64-suspend.sh
> > > new file mode 100755
> > > index 000000000000..fc479c9faaec
> > > --- /dev/null
> > > +++ b/automation/scripts/qubes-x86-64-suspend.sh
> > > @@ -0,0 +1,155 @@
> > > +#!/bin/sh
> > > +
> > > +set -ex
> > > +
> > > +test_variant=$1
> > > +
> > > +wait_and_wakeup=
> > > +if [ -z "${test_variant}" ]; then
> > > +    passed="ping test passed"
> > > +    domU_check="
> > > +until ifconfig eth0 192.168.0.2 &> /dev/null && ping -c 10 192.168.0.1; 
> > > do
> > > +    sleep 30
> > > +done
> > > +echo \"${passed}\"
> > > +"
> > > +elif [ "${test_variant}" = "s3" ]; then
> > > +    passed="suspend test passed"
> > > +    wait_and_wakeup="started, suspending"
> > > +    domU_check="
> > > +ifconfig eth0 192.168.0.2
> > > +echo domU started
> > > +"
> > > +    dom0_check="
> > > +until grep 'domU started' /var/log/xen/console/guest-domU.log; do
> > > +    sleep 1
> > > +done
> > > +echo \"${wait_and_wakeup}\"
> > > +set -x
> > > +echo deep > /sys/power/mem_sleep
> > > +echo mem > /sys/power/state
> > > +# now wait for resume
> > > +sleep 5
> > > +# get domU console content into test log
> > > +tail -n 100 /var/log/xen/console/guest-domU.log
> > > +xl list
> > > +xl dmesg | grep 'Finishing wakeup from ACPI S3 state' || exit 1
> > > +# check if domU is still alive
> > > +ping -c 10 192.168.0.2 || exit 1
> > > +echo \"${passed}\"
> > > +"
> > > +fi
> > 
> > Very nice!
> > 
> > 
> > > +# DomU
> > > +mkdir -p rootfs
> > > +cd rootfs
> > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > 
> > I am a bit confused about it: are you sure you need fakeroot for this?
> > This script is running inside a container as root already? Are you using
> > Docker on the RPi4 to run this job?
> 
> This is running in a rootless podman container. But even with docker,
> for device files to work (see commit message) it would need to run
> privileged container, which I'd like to avoid.

Are you sure? I can run a non-privileged container with device assigned
just fine with Docker and
 
  devices = ["/dev/ttyUSB0:/dev/ttyUSB0"]

in the gitlab-runner config.toml.

Maybe it is just a matter of access permissions on your devices?


In any case, if your podman environment requires fakeroot, that is a
good enough reason for me. Just please add a note about it in the
commit message or as in-code comment in the script.


> > > +mkdir proc
> > > +mkdir run
> > > +mkdir srv
> > > +mkdir sys
> > > +rm var/run
> > > +echo "#!/bin/sh
> > > +
> > > +${domU_check}
> > > +/bin/sh" > etc/local.d/xen.start
> > > +chmod +x etc/local.d/xen.start
> > > +echo "rc_verbose=yes" >> etc/rc.conf
> > > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > 
> > > ../binaries/domU-rootfs.cpio.gz
> > 
> > same here
> 
> Same answer :)
> 
> Specifically it's required to preserve dev/null, dev/console etc.
> 
> > > +cd ..
> > > +rm -rf rootfs
> > > +
> > > +# DOM0 rootfs
> > > +mkdir -p rootfs
> > > +cd rootfs
> > > +fakeroot -s ../fakeroot-save tar xzf ../binaries/initrd.tar.gz
> > > +mkdir boot
> > > +mkdir proc
> > > +mkdir run
> > > +mkdir srv
> > > +mkdir sys
> > > +rm var/run
> > > +cp -ar ../binaries/dist/install/* .
> > > +
> > > +echo "#!/bin/bash
> > > +
> > > +export LD_LIBRARY_PATH=/usr/local/lib
> > > +bash /etc/init.d/xencommons start
> > > +
> > > +brctl addbr xenbr0
> > > +brctl addif xenbr0 eth0
> > > +ifconfig eth0 up
> > > +ifconfig xenbr0 up
> > > +ifconfig xenbr0 192.168.0.1
> > > +
> > > +xl create /etc/xen/domU.cfg
> > > +${dom0_check}
> > > +" > etc/local.d/xen.start
> > > +chmod +x etc/local.d/xen.start
> > > +# just PVH for now
> > > +echo '
> > > +type = "pvh"
> > > +name = "domU"
> > > +kernel = "/boot/vmlinuz"
> > > +ramdisk = "/boot/initrd-domU"
> > > +extra = "root=/dev/ram0 console=hvc0"
> > > +memory = 512
> > > +vif = [ "bridge=xenbr0", ]
> > > +disk = [ ]
> > > +' > etc/xen/domU.cfg
> > > +
> > > +echo "rc_verbose=yes" >> etc/rc.conf
> > > +echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
> > > +echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
> > > +mkdir -p var/log/xen/console
> > > +cp ../binaries/bzImage boot/vmlinuz
> > > +cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
> > > +find . | fakeroot -i ../fakeroot-save cpio -H newc -o | gzip > 
> > > ../binaries/dom0-rootfs.cpio.gz
> > > +cd ..
> > > +
> > > +
> > > +TFTP=/scratch/gitlab-runner/tftp
> > > +
> > > +echo '
> > > +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
> > > guest_loglvl=all
> > > +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
> > > +module2 (http)/gitlab-ci/initrd-dom0
> > > +' > $TFTP/grub.cfg
> > > +
> > > +cp -f binaries/xen $TFTP/xen
> > > +cp -f binaries/bzImage $TFTP/vmlinuz
> > > +cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0
> > > +
> > > +# start logging the serial; this gives interactive console, don't close 
> > > its
> > > +# stdin to not close it; the 'cat' is important, plain redirection would 
> > > hang
> > > +# until somebody opens the pipe; opening and closing the pipe is used to 
> > > close
> > > +# the console
> > > +mkfifo /tmp/console-stdin
> > > +cat /tmp/console-stdin |\
> > > +ssh control@thor.testnet console | tee smoke.serial &
> > 
> > For clarity I would add a variable just below TFTP above to store the
> > controller node. This will make it easier to change in the future and
> > also makes it easier to see that it is runner-specific detail, e.g.:
> > 
> > TFTP=/scratch/gitlab-runner/tftp
> > CONTROLLER=control@thor.testnet
> > 
> > then here:
> > 
> > ssh $CONTROLLER console | tee smoke.serial &
> 
> Ok.
> 
> > > +# start the system pointing at gitlab-ci predefined config
> > > +ssh control@thor.testnet gitlabci poweron
> > > +trap "ssh control@thor.testnet poweroff; : > /tmp/console-stdin" EXIT
> > > +
> > > +if [ -n "$wait_and_wakeup" ]; then
> > > +    # wait for suspend or a timeout
> > > +    timeout=120
> > > +    until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do
> > > +        sleep 1;
> > > +        : $((--timeout))
> > > +    done
> > > +    if [ $timeout -le 0 ]; then
> > > +        echo "ERROR: suspend timeout, aborting"
> > > +        exit 1
> > > +    fi
> > > +    # keep it suspended a bit, then wakeup
> > > +    sleep 30
> > > +    ssh control@thor.testnet wake
> > > +fi
> > 
> > I like this. It will shorten the duration of the test job. I am not
> > asking you to do anything, but I think it would be a good idea to have
> > all test scripts loop around like this if possible to poll for success
> > every second rather than wait a pre-determined amount of time, which is
> > usually several minutes longer than necessary.
> > 
> > 
> > > +# give the test time to run a bit, and then check
> > > +sleep 30
> > > +
> > > +(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" 
> > > smoke.serial) || exit 1
> > > +exit 0
> > > -- 
> > > git-series 0.9.1
> > > 
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> 

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.