[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] HTPC + DUAL PC In one

  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Thu, 17 Jul 2014 07:52:44 +0100
  • Delivery-date: Thu, 17 Jul 2014 06:53:30 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/17/2014 02:25 AM, Austin S Hemmelgarn wrote:
On 07/16/2014 01:01 PM, Gordan Bobic wrote:
On 2014-07-16 16:01, Austin S Hemmelgarn wrote:
I hadn't thought about this before now, but part of my results may be
because my desktop is running Gentoo with very aggressive optimizations
for the specific processor, whereas the Intel server is running Fedora
20, which just uses -O2 -mtune=generic for optimizations.

Different optimization levels make relatively minor differences. It's
when you switch to a compiler that does vectorization properly (e.g. ICC)
that you see significant performance increases.

I would like to point out, ICC used to do some really dirty tricks to
prevent code built with from running at peak efficiency on non-Intel

Yes it did - 10 years ago. It checked the CPU ID and if it said "Genuine Intel", ran the vectorized code, otherwise it ran the non-vectorized code. They got some bad publicity, and stopped the compiler from doing it.

ICC still produces fastest code of any off the shelf compiler for running on x86, including when running on AMD CPUs.

Also, the only reason that I still use GCC is because not
everything builds correctly with Clang.

Unfortunately, there is a large amount of software that doesn't build properly with compilers other than GCC. There was a project a few years ago to modify the Linux kernel so it builds properly with ICC, but it stopped being maintained.

factor might be that most of my workloads, and therefore most of the
benchmarking that I do, are memory-bound, and even though both systems
use DDR3-1600 memory, the server is a NUMA system and has the memory
split between the two processors.

That can make a difference, depending on how good the scheduler is
at migrating process to the memory rather than remote accessing
the memory.

Linux is generally pretty good at this, but doesn't bind
processes/threads to a given core unless the app or the administrator
explicitly tells it to, which means that the memory migration still
hurts latency/throughput.

There is certainly scope for the scheduler to be aware of NUMA and act accordingly, only scheduling processes to run on a CPU remote to the memory when all the CPU cores near to the memory are busy.

Just comparing processors of similar price from AMD and Intel, you will
almost always get a better processor from AMD.  It may not always have
the most up-to date set of ISA extensions, but that hardly matters when
running Windows because Windows won't try to take advantage of anything
that came out after that version of Windows (which is why XP's
performance sucks compared to Win7 on newer systems).

I never noticed this at all. Bloat and feature creep vastly outweighs
relatively marginal benefits from minor ISA extensions. Consider that
x86-64 features SSE (there is no x86-64 CPU that doesn't have SSE),
which makes a big difference _if you use it_ (which most compilers do
a very poor job of), but jumps to SSE2 and further make relatively
little difference). So if you are running XP x64 there is going to be
very little performance from compiler output compared to, say, Windows
7 x64.

AVX actually does provide a measurable improvement over SSE*, and a lot
of the bit-field manipulation extensions (LZCNT, POPCNT, BMI, TBM, etc)
can provide a very significant boost in performance over processors that
don't have them.

Yes but the use of those is going to be very application dependant. Typical desktop use is mostly pointer chasing rather than tight vectorizable loops.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.