[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] automation: add a smoke test for xen.efi on X86



On Wed, Oct 02, 2024 at 03:22:59PM -0700, Stefano Stabellini wrote:
> I forgot to reply to one important part below
> 
> 
> On Wed, 2 Oct 2024, Stefano Stabellini wrote:
> > On Wed, 2 Oct 2024, Marek Marczykowski-Górecki wrote:
> > > Check if xen.efi is bootable with an XTF dom0.
> > > 
> > > The TEST_TIMEOUT is set in the script to override project-global value.
> > > Setting it in the gitlab yaml file doesn't work, as it's too low
> > > priority
> > > (https://docs.gitlab.com/ee/ci/variables/#cicd-variable-precedence).
> > > 
> > > The multiboot2+EFI path is tested on hardware tests already.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki 
> > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > > ---
> > > This requires rebuilding debian:bookworm container.
> > > 
> > > The TEST_TIMEOUT issue mentioned above applies to xilix-* jobs too. It's
> > > not clear to me why the default TEST_TIMEOUT is set at the group level
> > > instead of in the yaml file, so I'm not adjusting the other places.
> > 
> > Let me premise that now that we use "expect" all successful tests will
> > terminate as soon as the success condition is met, without waiting for
> > the test timeout to expire.
> > 
> > There is a CI/CD variable called TEST_TIMEOUT set at the
> > gitlab.com/xen-project level. (There is also a check in console.exp in
> > case TEST_TIMEOUT is not set so that we don't run into problems in case
> > the CI/CD variable is removed accidentally.) The global TEST_TIMEOUT is
> > meant to be a high value to account for slow QEMU tests running
> > potentially on our slowest cloud runners.
> > 
> > However, for hardware-based tests such as the xilinx-* jobs, we know
> > that the timeout is supposed to be less than that. The test is running
> > on real hardware which is considerably faster than QEMU running on our
> > slowest runners. Basically, the timeout depends on the runner more than
> > the test. So we override the TEST_TIMEOUT variable for the xilinx-* jobs
> > providing a lower timeout value.
> > 
> > The global TEST_TIMEOUT is set to 1500.
> > The xilinx-* timeout is set to 120 for ARM and 1000 for x86.
> > 
> > You are welcome to override the TEST_TIMEOUT value for the
> > hardware-based QubesOS tests. At the same time, given that on success
> > the timeout is not really used, it is also OK to leave it like this.
>  
>  
> > > ---
> > >  automation/build/debian/bookworm.dockerfile |  1 +
> > >  automation/gitlab-ci/test.yaml              |  7 ++++
> > >  automation/scripts/qemu-smoke-x86-64-efi.sh | 44 +++++++++++++++++++++
> > >  3 files changed, 52 insertions(+)
> > >  create mode 100755 automation/scripts/qemu-smoke-x86-64-efi.sh
> > > 
> > > diff --git a/automation/build/debian/bookworm.dockerfile 
> > > b/automation/build/debian/bookworm.dockerfile
> > > index 3dd70cb6b2e3..061114ba522d 100644
> > > --- a/automation/build/debian/bookworm.dockerfile
> > > +++ b/automation/build/debian/bookworm.dockerfile
> > > @@ -46,6 +46,7 @@ RUN apt-get update && \
> > >          # for test phase, qemu-smoke-* jobs
> > >          qemu-system-x86 \
> > >          expect \
> > > +        ovmf \
> > >          # for test phase, qemu-alpine-* jobs
> > >          cpio \
> > >          busybox-static \
> > > diff --git a/automation/gitlab-ci/test.yaml 
> > > b/automation/gitlab-ci/test.yaml
> > > index 8675016b6a37..74fd3f3109ae 100644
> > > --- a/automation/gitlab-ci/test.yaml
> > > +++ b/automation/gitlab-ci/test.yaml
> > > @@ -463,6 +463,13 @@ qemu-smoke-x86-64-clang-pvh:
> > >    needs:
> > >      - debian-bookworm-clang-debug
> > >  
> > > +qemu-smoke-x86-64-gcc-efi:
> > > +  extends: .qemu-x86-64
> > > +  script:
> > > +    - ./automation/scripts/qemu-smoke-x86-64-efi.sh pv 2>&1 | tee 
> > > ${LOGFILE}
> > > +  needs:
> > > +    - debian-bookworm-gcc-debug
> > 
> > Given that the script you wrote (thank you!) can also handle pvh, can we
> > directly add a pvh job to test.yaml too?

I guess we can, but is xen.efi + PVH dom0 actually different enough to
worth testing given we already test MB2+EFI + PVH dom0?

> > >  qemu-smoke-riscv64-gcc:
> > >    extends: .qemu-riscv64
> > >    script:
> > > diff --git a/automation/scripts/qemu-smoke-x86-64-efi.sh 
> > > b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > > new file mode 100755
> > > index 000000000000..e053cfa995ba
> > > --- /dev/null
> > > +++ b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > > @@ -0,0 +1,44 @@
> > > +#!/bin/bash
> > > +
> > > +set -ex -o pipefail
> > > +
> > > +# variant should be either pv or pvh
> > > +variant=$1
> > > +
> > > +# Clone and build XTF
> > > +git clone https://xenbits.xen.org/git-http/xtf.git
> > > +cd xtf && make -j$(nproc) && cd -
> > > +
> > > +case $variant in
> > > +    pvh) k=test-hvm64-example    extra="dom0-iommu=none dom0=pvh" ;;
> > > +    *)   k=test-pv64-example     extra= ;;
> > > +esac
> > > +
> > > +mkdir -p boot-esp/EFI/BOOT
> > > +cp binaries/xen.efi boot-esp/EFI/BOOT/BOOTX64.EFI
> > > +cp xtf/tests/example/$k boot-esp/EFI/BOOT/kernel
> > > +
> > > +cat > boot-esp/EFI/BOOT/BOOTX64.cfg <<EOF
> > > +[global]
> > > +default=test
> > > +
> > > +[test]
> > > +options=loglvl=all console=com1 noreboot console_timestamps=boot $extra
> > > +kernel=kernel
> > > +EOF
> > > +
> > > +cp /usr/share/OVMF/OVMF_CODE.fd OVMF_CODE.fd
> > > +cp /usr/share/OVMF/OVMF_VARS.fd OVMF_VARS.fd
> > > +
> > > +rm -f smoke.serial
> > > +export TEST_CMD="qemu-system-x86_64 -nographic -M 
> > > q35,kernel-irqchip=split \
> > > +        -drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \
> > > +        -drive if=pflash,format=raw,file=OVMF_VARS.fd \
> > > +        -drive file=fat:rw:boot-esp,media=disk,index=0,format=raw \
> > > +        -m 512 -monitor none -serial stdio"
> > > +
> > > +export TEST_LOG="smoke.serial"
> > > +export PASSED="Test result: SUCCESS"
> > > +export TEST_TIMEOUT=120
> 
> Although this works, I would prefer keeping the TEST_TIMEOUT overrides
> in test.yaml for consistency. 

The problem is this doesn't work. The group-level variable overrides the
one in yaml. See the commit message and the link there...

> However, it might be better not to
> override it (or to override to a higher timeout value), as successful
> tests will terminate immediately anyway. We need to be cautious about
> setting TEST_TIMEOUT values too low, as using a slow runner (like a
> small, busy cloud instance) can lead to false positive failures. This
> issue occurred frequently with ARM tests when we temporarily moved from
> a fast ARM server to slower ARM cloud instances a couple of months ago.
> 
> On the other hand, adjusting TEST_TIMEOUT for non-QEMU hardware-based
> tests is acceptable since those tests rely on real hardware
> availability, which is unlikely to become suddenly slower.


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.