[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 28 Mar 2023 15:27:59 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Btt827VBYLFqUue410Zhu8SboxTGCWNkWDFTgaM5Z5I=; b=lLItXtaztx//7NgwMvhkLY3mW1/bFJqMSO6kumsK33P8roj3L1uT2Rg2NvAzlaGudU2JBWL+Qy7eB6mljpNh4dO8rJYvv7jsBPN05euJQbjLYOJa/j6Ft4adS531K9v4WWHzL0NV51BC5HvT5Vjw1mdmOILgv+EiYS4VahpyhVgrkvSSW04M0/PelMQgXWV77Syr5BTz4V74+KRo7NtKUBGTDClsvxPDhQyxf1THh87M2wtmRM82UsgXgMMJh7CfEgv1JjzbTbOeUuQQp6jkBSJzSoZFGId5UzTwj+9OiX/xgPoh3EYyQ/pb+RFFbPjgl2JAOJU3FkrGsUhmOhcM6w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GwIVhhZI40rF0x+nWNuXr67gJsDEX8Jo0Jgo74T0TP1/ys9Nvi/hzMA7tZVwBYIqHJfCA5RqV5GW9RlA3KXlSoJJb2XXDXri357JaTiiBWgSFWTFtw8P+vDNdqLP532LVTDqldlrB4XDCD/sWzbAXPpuRWGfWYGKnMMW42FwARqqpcUWRC5WzKsybnUE2Nqkdvk+FVM7UlEU6myToQoUSNmTK+92Sv8XNKO5G7Rm2kyQALWAnxwjlFuyWa3voZ+2whLRBJbn0cAYvxpUpjBINMgVED0+QT78alp8lm2RlCTtbf1ZAkgBa4EfdFPe3WFkm6I2TLHwCqHAl43ANnb/8w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Jason Andryuk <jandryuk@xxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 28 Mar 2023 13:28:35 +0000
  • Ironport-data: A9a23:QFGsS6OhfAqD5N3vrR16lsFynXyQoLVcMsEvi/4bfWQNrUp01D1Uz GsdXmuDb/jfZ2HyfN50PIu2oUoBsMXTyt42Gwto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvLrRC9H5qyo42tD5AxmP5ingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0s9KBUBr1 OEXETIiRQuDo/25/o6FFdA506zPLOGzVG8ekldJ6GiASN0BGNXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PRxujeLpOBy+OGF3N79YNuFSN8Thk+Fj mnH4374ElcRM9n3JT+tqyr237OfwH+jMG4UPJGSq8Ipim2O+nNNDSRLeAK8h9Lpj3frDrqzL GRRoELCt5Ma9kamU938VB2Qu2Ofs1gXXN84O+gz8gSE0KfXywefGGkfTzRFZcAmtck5Xjgj3 BmCmNaBLT5mtrGPWG+e3riRpDK2fyMSKAcqfSYZSSMV7t+lp5s85jrfQ9AmHKOrg9ndHTDr3 yvMvCU4n68Uj8MAy+O851+vvt63jp3ATwpw7QKOWGugt1p9fNT8ONTu7kXH5/FdKorfVkOGo HUPh8mZ6qYJEI2JkyuOBu4KGdlF+sq4DdEVunY3d7FJythn0yLLkVx4iN2mGHpUDw==
  • Ironport-hdrordr: A9a23:OIXO+aD7TlsDRSXlHelo55DYdb4zR+YMi2TDt3oddfU1SL38qy nKpp4mPHDP5wr5NEtPpTniAtjjfZq/z/5ICOAqVN/PYOCPggCVxepZnOjfKlPbehEX9oRmpN 1dm6oVMqyMMbCt5/yKnDVRELwbsaa6GLjDv5a785/0JzsaE52J6W1Ce2GmO3wzfiZqL7wjGq GR48JWzgDQAkj+PqyAdx84t/GonayzqK7b
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>> MASKALL set. Xen relies on initial state being both disabled.
> >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>
> >> But pci_reset_msix_state() comes into play only when assigning a device
> >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >> bit, how would it be cleared the next time round (i.e. after the guest
> >> stopped and then possibly was started again)? It feels like the issue
> >> wants dealing with elsewhere, possibly in the tool stack.
> > 
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> 
> Iirc Roger and you already discussed that there needs to be an
> indication of device reset having happened, so that Xen can resync
> from this "behind its back" operation. That would look to be the
> point/place where such inconsistencies should be eliminated.

I think that was a different conversation with Huang Rui related to
the AMD GPU work, see:

https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/

I understood the problem Marek was trying to solve was that some
devices where initialized with the MASKALL bit set (likely by the
firmware?) and that prevented Xen from using them.  But now seeing the
further replies on this patch I'm unsure whether that's the case.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.