[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AMD EPYC virtual network performances



On Thu, Aug 15, 2024 at 10:33:52AM +0200, Jürgen Groß wrote:
> On 14.08.24 22:26, Elliott Mitchell wrote:
> > On Wed, Aug 14, 2024 at 08:15:38AM +0200, Jürgen Groß wrote:
> > > On 14.08.24 00:36, Elliott Mitchell wrote:
> > > > 
> > > > Drat.  I haven't noticed much, which would match with simply enabling a
> > > > bunch of debugging printk()s (alas I'm not monitoring performance 
> > > > closely
> > > > enough to be sure).  Guess I wait for Andrei Semenov to state a comfort
> > > > level with trying "iommu=debug".
> > > 
> > > You didn't answer my question.
> > 
> > I guess I did not explicitly do so.  I was referring to the Xen
> > command-line option.
> 
> And again you didn't supply the information I asked for (command line
> options of Xen and dom0).

I had thought that was aimed at the initial reporter, Andrei Semenov.
I've already supplied all options which could plausibly effect this.

Xen: placeholder watchdog=true loglvl=info iommu=debug iommu=no-intremap 
cpuidle dom0_mem=... no-real-mode edd=off

Linux; placeholder root=... ro concurrency=none vga=normal xen_pciback.hide=... 
net.ifname=1 nomodeset blacklist=...

Somehow the others really don't seem likely to effect either issue as
they don't effect interrupts (okay some drivers don't get loaded, but
those drivers would render the problem irreproduceable).

> Did you consider that asking for help while not supplying data which has
> been asked for is going to result in no help at all? You are wasting the
> time of volunteers, which will reduce the motivation to look into your
> issue a lot.

Sorry about being a bit inconsistent, but there is little I can do in
the short-term about that.  I would *strongly* prefer to keep information
protected (PGP).  Large commercial installations have a rather different
risk/privacy threshold.

It is also notable that I am not the sole reporter of any of these
issues.


My speculation was wrong, the "CPU*: No irq handler for vector..."
messages are not strictly tied to the HVM domain.  I think I'm simply
reproducing what Andrei Semenov observed, HVM domains are more severely
effected by this issue.

I think I need to repeat an earlier observation of mine.  It is NOT just
vifs being effected by this, event channels associated with virtual
block-devices are also seeing spurious interrupts though at a lower rate.

What comes to mind is vifs would generate interrupts at a much greater
rate than vbds.  I kind of suspect a timing issue involving closely
spaced interrupt.  I wonder if there is a simple timing issue with the
vif/vbd protocol.  Problem is how would these turn into spurious
interrupts observed by Xen?  (I had suspected Qemu for the HVM domain,
but that was disproven)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.