[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xen | Failed pipeline for staging | 6a47ba2f
- To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
- From: andrew.cooper3@xxxxxxxxxx
- Date: Sat, 29 Apr 2023 20:48:25 +0100
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mi2cszWA+xCK2a3jom6TzyTBnHGobW48pnHq9HdvvWM=; b=Vi7/NH9tphvXg/4q3XobjdX67dxBQyx0w9FOCIjgKs9BQDxi2l96ZflQxXPsGGq/zceUU5Wox0rcIN7/FuYzoc8Szb6PIIcHQpP5eT7JhDdl7nbaOBsiufyDMgZpyWpRiwsOtv98pT4eXDIhkqq5U0pKVdGcV+t+MsRERCw/Ks8LcZXTYPtC9k73taaLJ5a2XYimpbmjUbdI4mxgyovfZfZrVyHF9AJzBRsfSTlSDsqX3BYfFBMN4cjm69VGakLOwD8LIk2lYT7aVvzA8XozqzmLkh8cxHPFEZ2UEZPj6w44NCOGSeCaiRA/eLHgZJdyztPX2D/WNAsuYoo4K+l7AA==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SxDl+COO8GYZ4N3xlWIFk5YURD2+5KAOIQ22l/aQQ8k2zrwpoJAq2ELwcHJfUXPGNbbdnd8e8EKOxD2gO2jNo9ruVQ2ddpnRthkGrD0POjrK7TPJr2Ds+7EaWLJ33z86gxXXBf9/XoYSr1EKu4l0s7WsedamYu5Ts8YVNe9mpSJ0udshGgLgPUx6292sROZVu15hynWaCBxher8IFFIuGe3AeTKzKn9r75jb+l+aVIhCli+zAmotN8+vLcp3KVvUwvxMXL6csJ4rtl6PSP9rMjWhKwNQZb+3cdVcBkJpvD+7KyiGpwLGZ5qIDd4cmOLHCZLN+LtyD6XZL0INpnkvvg==
- Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
- Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, alejandro.vallejo@xxxxxxxxx, committers@xxxxxxxxxxxxxx, michal.orzel@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
- Delivery-date: Sat, 29 Apr 2023 19:49:03 +0000
- Ironport-data: A9a23:dyCNVK3VJEL7FFC7wvbD5UZwkn2cJEfYwER7XKvMYLTBsI5bp2RVn WBOCGjSM/uKNmqjfYp+bI3joUpUvpeEnNZkHlA5pC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK6ULWeUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS+XuDgNyo4GlD5gBkNKgR1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfQntAr PMCMxYxcA2H2MGwzb38Y+JLiZF2RCXrFNt3VnBI6xj8VK9jareaBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxqvS6Kk1UZPLvFabI5fvSjQ8lPk1nej WXB52njWTkRNcCFyCrD+XWp7gPKtXqjBtpDSufgrpaGhnWu6GgNLTgcaWG5gtyDiWmPVZVwN 2kLr39GQa8asRbDosPGdxixunuNpBMfc9tWDewh6QuJx7bU4gCWHWwNRHhKb9lOnNQtWTUg2 1uNntXoLT9iqruYTTSa7Lj8hTq2NCocK2MYYmkaRA8B7tvkiIo3iQ/DCN1kFcadhdrwHDDs3 z2QtwAuirMLl8kJ2q6nu1fdjFqEo5nCTgcxoALNTG+hxgp8aMiuYInAwUjW67NMIZiUSnGFv WMYgI6O4eYWF5aPmSeRBuIXE9mUC+2tNTTdhRtkGMAn/jH0onq7J9kPuXd5OVtjNdsCdXnxe kjPtAhN5ZhVeny3catwZIH3AMMvpUT9KenYujnvRoImSvBMmMWvpUmCuWb4M7jRrXUR
- Ironport-hdrordr: A9a23:Lw7gTKDPxWx9by7lHejKsseALOsnbusQ8zAXPh9KJCC9I/bzqy nxpp8mPH/P5wr5lktQ4OxoS5PwJk80kqQFnLX5XI3SJjUO3VHFEGgM1/qA/9SNIVyaygcZ79 YaT0EcMqyPMbEZt6bHCWCDer5PoeVvsprY/ds2p00dMj2CAJsQizuRZDzrdHGeCDM2Z6bQQ/ Gnl7Z6TnebCD0qR/X+IkNAc/nIptXNmp6jSRkaByQ/4A3LoSK05KX8Gx242A5bdz9U278t/U XMjgS8v8yYwryG4y6Z81WWw4VdmdPnxNcGLMuQivINIjGpphe0aJ9nU7iiuilwhO208l4lnP TFvh9lFcVu7HH6eH2zvHLWqkPd+Qdrz0Wn5U6TgHPlr8C8bDUmC/BZjYYcVhfC8UIvsPx1za oOhguixtFqJCKFuB64y8nDVhlsmEbxiX0+kdQLh3gadYcFcrdeoaEW4UsQOpYdGyDR7pwhDY BVfYnhzccTVWnfQ2HSv2FpztDpdnMvHi2eSkxHgcCR2yg+pgEM82IogOgk2lsQ/pM0TJdJo8 7eNL5zqb1IRsgKKYpgGeYoW6KMey3waCOJFFjXDUXsFakBNX6IgYXw+q8J6Oajf4FN5Icuma 7GTEhTuQcJCgzT4PW1rd52Gy32MSeAtWyH8LAa23E5gMyyeFPTC1zCdLh0+PHQ58n2AaXgKo OO0dxtcrjexFDVaPV0Nj3FKuhvwEYlIb0oU/YAKiWzS5HwW/vXn92eVsrvD5zQNhthcl/DIx I4LUrOzYN7nwyWZkM=
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On 29/04/2023 2:34 pm, Marek Marczykowski-Górecki wrote:
> On Sat, Apr 29, 2023 at 12:41:26PM +0100, andrew.cooper3@xxxxxxxxxx wrote:
>> On 29/04/2023 4:05 am, Stefano Stabellini wrote:
>>> On Fri, 28 Apr 2023, GitLab wrote:
>>>> Pipeline #852233694 triggered by
>>>> [568538936b4ac45a343cb3a4ab0c6cda?s=48&d=identicon]
>>>> Ganis
>>>> had 3 failed jobs
>>>> Failed jobs
>>>> ✖
>>>> test
>>>> qemu-smoke-dom0less-arm64-gcc
>>> This is a real failure on staging. Unfortunately it is intermittent. It
>>> usually happens once every 3-8 tests for me.
>>>
>>> The test script is:
>>> automation/scripts/qemu-smoke-dom0less-arm64.sh
>>>
>>> and for this test it is invoked without arguments. It is starting 2
>>> dom0less VMs in parallel, then dom0 does a xl network-attach and the
>>> domU is supposed to setup eth0 and ping.
>>>
>>> The failure is that nothing happens after "xl network-attach". The domU
>>> never hotplugs any interfaces. I have logs that show that eth0 never
>>> shows up and the only interface is lo no matter how long we wait.
>>>
>>>
>>> On a hunch, I removed Alejandro patches. Without them, I ran 20 tests
>>> without any failures. I have not investigated further but it looks like
>>> one of these 4 commits is the problem:
>>>
>>> 2023-04-28 11:41 Alejandro Vallejo tools: Make init-xenstore-domain use
>>> xc_domain_getinfolist()
>>> 2023-04-28 11:41 Alejandro Vallejo tools: Refactor console/io.c to avoid
>>> using xc_domain_getinfo()
>>> 2023-04-28 11:41 Alejandro Vallejo tools: Create
>>> xc_domain_getinfo_single()
>>> 2023-04-28 11:41 Alejandro Vallejo tools: Make some callers of
>>> xc_domain_getinfo() use xc_domain_getinfol
>> In commit order (reverse of above), these patches are:
>>
>> 1) Modify the python bindings and xenbaked
>> 2) Introduce a new library function with a better API/ABI
>> 3) Modify xenconsoled
>> 4) Modify init-xenstore-domain
>>
>> The test isn't using anything from 4 or 1, and 2 definitely isn't
>> breaking anything on its own.
>>
>> That just leaves 3. This test does turn activate xenconsoled by virtue
>> of invoking xencommons, but that doesn't help explain why a change in
>> xenconsoled interferes (and only intermittently on this one single test)
>> with `xl network-attach`.
>>
>> The xenconsoled change does have correctness fix in it, requiring
>> xenconsoled to ask for all domains info in one go. This does mean it's
>> hypercall-buffering (i.e. bouncing) a 4M array now where previously it
>> was racy figuring out which VMs had come and gone.
> Can it be that xl network-attach fails and that failure is silently
> ignored by the test?
Well, it's ultimately doing a ping test between the two VMs, so the
network-attach is rather important. I don't see an obviously way for us
to get false negatives like this.
~Andrew
|