[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen | Failed pipeline for staging | 6a47ba2f



On Sat, Apr 29, 2023 at 12:41:26PM +0100, andrew.cooper3@xxxxxxxxxx wrote:
> On 29/04/2023 4:05 am, Stefano Stabellini wrote:
> > On Fri, 28 Apr 2023, GitLab wrote:
> >> Pipeline #852233694 triggered by
> >> [568538936b4ac45a343cb3a4ab0c6cda?s=48&d=identicon]
> >> Ganis
> >> had 3 failed jobs
> >> Failed jobs
> >> ✖
> >> test
> >> qemu-smoke-dom0less-arm64-gcc
> > This is a real failure on staging. Unfortunately it is intermittent. It
> > usually happens once every 3-8 tests for me.
> >
> > The test script is:
> > automation/scripts/qemu-smoke-dom0less-arm64.sh
> >
> > and for this test it is invoked without arguments. It is starting 2
> > dom0less VMs in parallel, then dom0 does a xl network-attach and the
> > domU is supposed to setup eth0 and ping.
> >
> > The failure is that nothing happens after "xl network-attach". The domU
> > never hotplugs any interfaces. I have logs that show that eth0 never
> > shows up and the only interface is lo no matter how long we wait.
> >
> >
> > On a hunch, I removed Alejandro patches. Without them, I ran 20 tests
> > without any failures. I have not investigated further but it looks like
> > one of these 4 commits is the problem:
> >
> > 2023-04-28 11:41 Alejandro Vallejo    tools: Make init-xenstore-domain use 
> > xc_domain_getinfolist()
> > 2023-04-28 11:41 Alejandro Vallejo    tools: Refactor console/io.c to avoid 
> > using xc_domain_getinfo()
> > 2023-04-28 11:41 Alejandro Vallejo    tools: Create 
> > xc_domain_getinfo_single()
> > 2023-04-28 11:41 Alejandro Vallejo    tools: Make some callers of 
> > xc_domain_getinfo() use xc_domain_getinfol 
> 
> In commit order (reverse of above), these patches are:
> 
> 1) Modify the python bindings and xenbaked
> 2) Introduce a new library function with a better API/ABI
> 3) Modify xenconsoled
> 4) Modify init-xenstore-domain
> 
> The test isn't using anything from 4 or 1, and 2 definitely isn't
> breaking anything on its own.
> 
> That just leaves 3.  This test does turn activate xenconsoled by virtue
> of invoking xencommons, but that doesn't help explain why a change in
> xenconsoled interferes (and only intermittently on this one single test)
> with `xl network-attach`.
> 
> The xenconsoled change does have correctness fix in it, requiring
> xenconsoled to ask for all domains info in one go.  This does mean it's
> hypercall-buffering (i.e. bouncing) a 4M array now where previously it
> was racy figuring out which VMs had come and gone.

Can it be that xl network-attach fails and that failure is silently
ignored by the test?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.