[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AMD EPYC virtual network performances



On Tue, Aug 13, 2024 at 08:55:42PM +0200, Jürgen Groß wrote:
> On 13.08.24 19:49, Elliott Mitchell wrote:
> > On Tue, Aug 13, 2024 at 01:16:06PM +0200, Jürgen Groß wrote:
> > > 
> > > I don't see a connection here, as spurious interrupts (as seen by the
> > > hypervisor in your case) and spurious events (as seen by Andrei) are
> > > completely different (hardware vs. software level).
> > 
> > The entries seem to appear at an average of about 1/hour.  Could be most
> > events are being dropped and 10x that number are occuring.  If so, those
> > extras could be turning into spurious events seen by various domains.
> 
> Even 10 spurious events per hour should not have a measurable impact
> on performance.

Come to think of it, depending upon time that domain is sometimes low
activity (build so it is either pegging or idle).  The 1/hour was during
idle times, so during busy times it might be much worse.  I haven't been
tracking `xl dmesg` as carefully recently.

Do the maintainers ever run machines with "iommu=debug"?  I'm actually
rather concerned *anything* spurious is showing up as I'm left suspecting
there may have been something lurking for some time.

> > There is a possibility spurious interrupts are being turned into spurious
> > events by the back-end drivers.
> 
> No, I don't think so.
> 
> > Jürgen Groß, what is the performance impact of "iommu=debug"?  Seems to
> > mostly cause more reporting and have minimal/no performance effect.
> 
> I guess you are referring to the Xen option? I'm no expert in this
> area.

Drat.  I haven't noticed much, which would match with simply enabling a
bunch of debugging printk()s (alas I'm not monitoring performance closely
enough to be sure).  Guess I wait for Andrei Semenov to state a comfort
level with trying "iommu=debug".


My guess is Andrei Semenov's issue is quite widespread.  I hadn't noticed
the issue until trying `find /sys/devices -name spurious_events -print`.

If you don't know where to look, Linux is apparently pretty good at
hiding this sort of issue.  The impact may not seem too severe if your
baseline was infected.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.