[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 9 Mar 2023 10:03:23 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IvrPsUqeclVuqnRYHwwmCDeuyZvzMYQ9zKT3YS7imwU=; b=n+lg/B4vFDCpBPtEq8xDOyteSaHKKVvHOK6rbR5JUrXgSPhdNdJZT9BE9IwNBhuLpvhzedCERu2xWpEtt+CMB8+kuTQPj9aFhBFgx0ODXlcnBkTp03MfUUz8gHNBNcaD0Ig3equmLvflNF/dinhnVePXSsQhHuMTluylqJtOWfXEfY+BShy1dYGiVA1aA7dpl2wYQEUxihfm93mnQFaNZdKe6wdBnYu812MdFI7WI/7lFAYesX/uv4XtDmR2QBK9NjU80eceb0g/J53xo2Y+UrMd0KAgVBMdhBFtrLwuMefqfgSM4dpfIt+lT2p/fZGb661I6yM4RAL23XiBtFsQiQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dHHlG1grMkejHpLJ/7/GRb+Ne0+NNCDBpqu+f/ZiJNTQ3TbLVZtSwX93T5s3X43DngRpJOVaaaHOrNv/R1kEQLcxCB8qgtmGmALCZHC3ppJbqF9xdLL3CJ0uEkVGRg8QOxeobYEoosD3KNgzo2AY9A1P8f/9ScXg3gwDlKq7pKA8VWQz9FwuKdyNefapZnY/AzdIOsPf3rxqhsswxfbRXGlj+W5xo2E32zRLHXT/UV1uOk+oT9RQM1PdZexMoenBV3TqDWGjI+cvo886r0INFobLUx4augL+htcrlJTgnjJuMWfKFjwhHt4ptF6socJzF7pBgGtENZF6QR1Ylo+f1w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 09 Mar 2023 09:03:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.03.2023 00:08, Elliott Mitchell wrote:
> On Wed, Mar 08, 2023 at 04:50:51PM +0100, Jan Beulich wrote:
>> On 08.03.2023 16:23, Elliott Mitchell wrote:
>>> Mostly SSIA.  As originally identified by "Neowutran", appears Xen's
>>> x2apic wrapper implementation fails with current generation AMD hardware
>>> (Ryzen 7xxx/Zen 4).  This can be worked around by passing "x2apic=false"
>>> on Xen's command-line, though I'm wondering about the performance impact.
>>>
>>> There hasn't been much activity on xen-devel WRT x2apic, so a patch which
>>> fixed this may have flown under the radar.  Most testing has also been
>>> somewhat removed from HEAD.
>>>
>>> Thanks to "Neowutran" for falling on this grenade and making it easier
>>> for the followers.  Pointer to first report:
>>> https://forum.qubes-os.org/t/ryzen-7000-serie/14538
>>
>> I'm sorry, but when you point at this long a report, would you please be a
>> little more specific as to where the problem in question is actually
>> mentioned? Searching the page for "x2apic" didn't give any hits at all
>> until I first scrolled to the bottom of the (at present) 95 comments. And
>> then while there are five mentions, there's nothing I could spot that
>> would actually help understanding what is actually wrong. A statement like
>> "... is because the implementation of x2apic is incorrect" isn't helpful
>> on its own. And a later statement by another person puts under question
>> whether "x2apic=false" actually helps in all cases.
>>
>> Please can we get a proper bug report here with suitable technical detail?
>> Or alternatively a patch to discuss?
> 
> Mostly I was pointing to the thread to credit Neowutran and company with
> originally finding the workaround.  I'm concerned about how
> representative my reproduction is since the computer in question is
> presently using Debian's build of Xen, 4.14.
> 
> As such I'm less than certain the problem is still in HEAD, though
> Neowutran and Co working with 4.16 and the commit log being quiet
> suggests there is a good chance.
> 
> More detail, pretty well most things are broken for Domain 0 without
> "x2apic=false".  Trying to boot with a 6.1.12 a USB keyboard was
> completely unresponsive, on screen the initial ramdisk script output was
> indicating problems interacting with storage devices.  Those two together
> suggested an interrupt issue and adding "x2apic=false" caused domain 0 to
> successfully boot.
> A 5.10 kernel similarly requires "x2apic=false" to successfully boot.
> 
> So could be a commit after 4.16 fixed x2apic for current AMD hardware,
> but may still be broken.

If Dom0 boot is affected, trying a newer hypervisor shouldn't be a problem.
You won't need any of the toolstack to match just to see whether Dom0 boots.

In any event you will want to collect a serial log at maximum verbosity.
It would also be of interest to know whether turning off the IOMMU avoids
the issue as well (on the assumption that your system has less than 255
CPUs).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.