[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
dom0less vs xenstored setup race Was: xen | Failed pipeline for staging | 6a47ba2f
- To: Stefano Stabellini <sstabellini@xxxxxxxxxx>, alejandro.vallejo@xxxxxxxxx
- From: andrew.cooper3@xxxxxxxxxx
- Date: Wed, 3 May 2023 15:38:51 +0100
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7nnX6IZ8Z2sVLN67lMjh6GA1f0VbnRwRssrhiTdYdV8=; b=jPqSbxkcszKui9XpuUGIhSTwL7eEIkvGlBFz8jOveArwmBoM+FWoWFM5TuNn/o0BusAEtz8PXOMkm0OCeNQEHpcCYol+whc37vHcQWIb2pU9BR+QmcrE7z4lx42P05KGTbBd3W64x6mKEV5mxg3mxKPIT9V/fnk740G1SDYxdrZe7TGfohOibOKnIDOdWgeUOBN8GE/pz6gAml9cv1BPrw73IUgAzX1SNvzAzxk14H0aeE06qCxUaiE0uIkEAc5Z2ux42LWt+2WGdTyDMIQbXITLxqExDl6SGoLXp9mQyEfo/v25Q7c0aNoGQ5LrRhjTrLCzabKm9tZrNw6sv7eZ6Q==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KFn9piLntcqBiRDvDM20tk2/H+fLr0BlyqpfCdVjWUaTUjRslxU+qN7Gj3RJ/VZaJ5VZjlsrm+cBgXItwfqr2HpXGkXv+AXYq1voLGD/6EeL6z1bVi8vbG2xMNY3wN043zYoeC1AxAR7hnkZCClub1aTtrOaqgaKBxz5d5Ely2zobuJsvXy7d7N5Cm2V0MmUmD9KsjsaSj27swaXmjYLsrM3BZxq6yPOMMrdbW9nI/wiLXEsarammGIawpMrgHM/aDHNZ8jZmemh/5GjEameLVRkWEAa4lUTfoqlOV1xnQ28QaqgRerowXtdui3nFJITc/YMybF+Gw98ll6StyOlyQ==
- Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
- Cc: committers@xxxxxxxxxxxxxx, michal.orzel@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Edwin Török <edwin.torok@xxxxxxxxx>
- Delivery-date: Wed, 03 May 2023 14:39:23 +0000
- Ironport-data: A9a23:jB6Si6BxslovwBVW/5Diw5YqxClBgxIJ4kV8jS/XYbTApDsm0j1Uz GNNXz+DOP+LMzf1eN9zadizo08H6pTUyIU2QQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nOHuGmYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFuspvlDs15K6p4G5A5QRkDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw3LknUFNcx 9cidT0yfz6krdu54LCDRbw57igjBJGD0II3nFhFlWucIdN9BJfJTuPN+MNS2yo2ioZWB/HCa sEFaD1pKhPdfxlIPVRRA5U79AuqriCnL3sE9xTI++xrvwA/zyQouFTpGPPTdsaHWoN+mUGAq 3id12/4HgsbJJqUzj/tHneE37efxHinBNlOfFG+3qZPhVuYmEESNDQTWXu28d60l0CyWOsKf iT4/QJr98De7neDVcLhVhe1pHqFuB80WNdKFeA+rgaXxcL86gKUBGECQiRGLsIvsMs7RzsC3 VuOgt+vDjtq2JWeTneY96uIoCmpETgYK3cYYi0JRhdD5MPsyKk2jxnPR9IlF7Sng9ndEDT8h TuNqUAWl7gVyMIGyai/1VTGmC634IjESBYv4QfaVX7j6Rl2DKanaJah7Vzz5PNJPoGfCFKGu RAsmceE6/sVJYqQjyHLS+IIdJmr7fuYNDzXgXZ0AoIssT+q/haLcIxP4Tc4OEBzNcUscjvuf UOVsgRUjKK/J1OvZK5zJo60UMIjyPC6Ecy/DqyIKN1TfpJ2aQmLujl0YlKd1Hzsl05qlrwjP ZCccoCnCnNy5blb8Qdajtw1idcDrh3SD0uKLXwn53xLCYajWUM=
- Ironport-hdrordr: A9a23:oHQ0WKzeNJf0CXoN5o9oKrPw6L1zdoMgy1knxilNoHxuH/Bw9v re+cjzsCWftN9/Yh4dcLy7VpVoIkmsl6Kdg7NwAV7KZmCP1FdARLsI0WKI+UyCJ8SRzI9gPa cLSdkFNDXzZ2IK8PoTNmODYqodKNrsytHWuQ/HpU0dKT2D88tbnn9E4gDwKDwQeCB2QaAXOb C7/cR9qz+paR0sH7+G7ilsZZmkmzXT/qiWGCI7Ow==
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Hello,
After what seems like an unreasonable amount of debugging, we've tracked
down exactly what is going wrong here.
https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/4219721944
Of note is the smoke.serial log around:
io: IN 0xffff90fec250 d0 20230503 14:20:42 INTRODUCE (1 233473 1 )
obj: CREATE connection 0xffff90fff1f0
*** d1 CONN RESET req_cons 00000000, req_prod 0000003a rsp_cons
00000000, rsp_prod 00000000
io: OUT 0xffff9105cef0 d0 20230503 14:20:42 WATCH_EVENT
(@introduceDomain domlist )
XS_INTRODUCE (in C xenstored at least, not checked O yet) always
clobbers the ring pointers. The added pressure on dom0 that the
xensconsoled adds with it's 4M hypercall bounce buffer occasionally
defers xenstored long enough that the XS_INTRODUCE clobbers the first
message that dom1 wrote into the ring.
The other behaviour seen was xenstored observing a header looking like this:
*** d1 HDR { ty 0x746e6f63, rqid 0x2f6c6f72, txid 0x74616c70, len
0x6d726f66 }
which was rejected as being too long. That's "control/platform" in
ASCII, so the XS_INTRODUCE intersected dom1 between writing the header
and writing the payload.
Anyway, it is buggy for XS_INTRODUCE to be called on a live an
unsuspecting connection. It is ultimately init-dom0less's fault for
telling dom1 it's good to go before having waited for XS_INTRODUCE to
complete.
I am going to start by correcting the documentation to make these
details clear, and then figure out what is the best set of steps to
unbreak this.
~Andrew
|