[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware



On Wed, Mar 08, 2023 at 04:50:51PM +0100, Jan Beulich wrote:
> On 08.03.2023 16:23, Elliott Mitchell wrote:
> > Mostly SSIA.  As originally identified by "Neowutran", appears Xen's
> > x2apic wrapper implementation fails with current generation AMD hardware
> > (Ryzen 7xxx/Zen 4).  This can be worked around by passing "x2apic=false"
> > on Xen's command-line, though I'm wondering about the performance impact.
> > 
> > There hasn't been much activity on xen-devel WRT x2apic, so a patch which
> > fixed this may have flown under the radar.  Most testing has also been
> > somewhat removed from HEAD.
> > 
> > Thanks to "Neowutran" for falling on this grenade and making it easier
> > for the followers.  Pointer to first report:
> > https://forum.qubes-os.org/t/ryzen-7000-serie/14538
> 
> I'm sorry, but when you point at this long a report, would you please be a
> little more specific as to where the problem in question is actually
> mentioned? Searching the page for "x2apic" didn't give any hits at all
> until I first scrolled to the bottom of the (at present) 95 comments. And
> then while there are five mentions, there's nothing I could spot that
> would actually help understanding what is actually wrong. A statement like
> "... is because the implementation of x2apic is incorrect" isn't helpful
> on its own. And a later statement by another person puts under question
> whether "x2apic=false" actually helps in all cases.
> 
> Please can we get a proper bug report here with suitable technical detail?
> Or alternatively a patch to discuss?

Mostly I was pointing to the thread to credit Neowutran and company with
originally finding the workaround.  I'm concerned about how
representative my reproduction is since the computer in question is
presently using Debian's build of Xen, 4.14.

As such I'm less than certain the problem is still in HEAD, though
Neowutran and Co working with 4.16 and the commit log being quiet
suggests there is a good chance.

More detail, pretty well most things are broken for Domain 0 without
"x2apic=false".  Trying to boot with a 6.1.12 a USB keyboard was
completely unresponsive, on screen the initial ramdisk script output was
indicating problems interacting with storage devices.  Those two together
suggested an interrupt issue and adding "x2apic=false" caused domain 0 to
successfully boot.
A 5.10 kernel similarly requires "x2apic=false" to successfully boot.

So could be a commit after 4.16 fixed x2apic for current AMD hardware,
but may still be broken.

I sent the message out of concern Neowutran got attention to the TSC
overflow issue, but I haven't seen any mention of the x2apic issue.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.