[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AMD EPYC virtual network performances



On 13.08.24 19:49, Elliott Mitchell wrote:
On Tue, Aug 13, 2024 at 01:16:06PM +0200, Jürgen Groß wrote:
On 13.08.24 03:10, Elliott Mitchell wrote:
On Tue, Jul 09, 2024 at 11:37:07AM +0200, Jürgen Groß wrote:

In both directories you can see the number of spurious events by looking
into the spurious_events file.

In the end the question is why so many spurious events are happening. Finding
the reason might be hard, though.

Hopefully my comments on this drew your attention, yet lack of response
suggests otherwise.  I'm wondering whether this is an APIC misprogramming
issue, similar to the x2APIC issue which was causing trouble with recent
AMD processors.

Trying to go after the Linux software RAID1, my current attempt is
"iommu=debug iommu=no-intremap".  I'm seeing *lots* of messages from
spurious events in `xl dmesg`.  So many I have a difficult time believing
they are related to hardware I/O.

Seeing them in `xl dmesg` means those spurious events are seen by the
hypervisor, not by the Linux kernel.

Indeed.  Yet this seems to be pointing at a problem, whereas most other
information sources merely indicate there is a problem.

I'm unable to resolve those to hardware.  This could mean those are being
synthesized by software and when crossing some interface they get
reinterpreted as hardware events.  This could mean those are hardware
events, but somewhere inside Xen information is being corrupted and the
information displayed is unrelated to the original event (x2APIC
misinterpretation?).


In which case could the performance problem observed by Andrei Semenov
be due to misprogramming of [x2]APIC triggering spurious events?

I don't see a connection here, as spurious interrupts (as seen by the
hypervisor in your case) and spurious events (as seen by Andrei) are
completely different (hardware vs. software level).

The entries seem to appear at an average of about 1/hour.  Could be most
events are being dropped and 10x that number are occuring.  If so, those
extras could be turning into spurious events seen by various domains.

Even 10 spurious events per hour should not have a measurable impact
on performance.

There is a possibility spurious interrupts are being turned into spurious
events by the back-end drivers.

No, I don't think so.

Jürgen Groß, what is the performance impact of "iommu=debug"?  Seems to
mostly cause more reporting and have minimal/no performance effect.

I guess you are referring to the Xen option? I'm no expert in this
area.


Juergen




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.