[KLUG Members] Re: Compiler optimization for CPU arch?

Bryan J. Smith members@kalamazoolinux.org
Tue, 17 Dec 2002 14:04:56 -0500 (EST)


Quoting Bruce Smith <bruce@armintl.com>:
> For the average user (whatever that is), how much difference does
> compiling for higher level CPU's really make?
> I know Mandrake compiles their entire distribution for i586.  Is that
> _noticeably_ faster than Redhat's i386 binaries?

Depends on the CPU.  Most clone x86 processors are designed for optimal i386/486
instruction execution.

Intel, on-the-otherhand, uses "Pentium optimizations" to get around a number of
design flaws in its Pentium and later architecture.  Unfortunately, the result
is that "Pentium optimizations" can be "de-optimizations" on another platform.

Fortunately, the reality is that Intel hasn't change the Pentium architecture
since 1992 (other than its never-ending SIMD instructions), so most newer clone
x86 processors now accomodate even "Pentium optimizations."**

At the same time, the Athlon has 2 full extra execution units than the Pentium
Pro/II/III/4 (one ALU, one FPU -- and the 3 x FPU can handle more complex
instructions than the Pentium Pro+'s 2 x FPUs).  Although AMD does a lot of
register renaming and run-time re-scheduling, newer GCC 3.1+ versions can
optimize scheduling for Athlon execution units.  If the program has an intense
amount of FPU operations, especially using other instructions than simple 64-bit
additions, the Athlon can get upto 40% speed boost.

[ **NOTE:  The only major errata I've seen lately was the 4M paging option in
the Pentium (the i386 normally uses 4K pages), which AMD didn't throughly test
in the Athlon with AGP (which is PCI + direct memory execution, cache coherency
nightmare).  This only affects the program that affects paging, i.e., the
kernel.  So when it 4M paging is enabled, to give an extra 5+% boost on
Pentiums, Athlons can be unstable if the AGP card is also accessing memory.  The
workaround is to, of course, not use 4M paging on Athlons, which doesn't give a
performance boost anyway (and Athlon handle 4K page lookups faster than Pentiums
in general). 

The SIMD instructions are a different story.  Intel is purposely reusing old
i386/486 opcodes so they will _crash_ all pre-Pentium 3 processors -- including
their own (and not just AMD).  Ala the Microsoft "forced upgrades" approach. ]

> The reason I ask, is I've been evaluating Linux browsers for my users.
> I compared Mozilla, Galeon (both stock RH8 RPM's), Netscape 7.01 and
> Opera 6.11.  Of the two people who tested these browsers, both seemed
> to think that Netscape was the fastest.  And Netscape had "i686" in it's
> filename, implying it was compiled for i686.  Is that why it seems to
> run faster?  Would it be worth recompiling Mozilla/Galeon
> w/-march=i686?

RedHat hasn't released the full details, they have confirmed they will be
releasing a RedHat Linux version specifically for Athlon x86[-32] and x86-64
processors.  There seems to be a sizeable boost from just Athlon optimizations,
even on the 32-bit models.  And then major kernel/system-level components are
also going to be compiled for Athlon 64 / Opteron as well.

Even nVidia has just released Athlon 64 / Opteron Linux drivers for its cards.

> And if one was to recompile certain packages (like Mozilla), is it
> worth making an i686 version AND an athlon version for AMD CPU PC's?
> Or would i686 be good enough for both?  (or how much more gain would
> -march=athlon gain me over -march=i686 on an Athlon)?

Er, probably not.  The "-march" option targets the processor family directly.  I
bet some opcodes are not supported on the Athlon -- even outside of the SIMD
instructions (which purposely reuse old i386/486 opcodes for incompatibility).

And it won't be the same as Athlon-optimizations.  Pentium Pro and Athlon are
quite different, and the Athlon offers more execution units that are best
optimized in the compiler (as with any CPU).

FURTHERMORE:  If you really want "speed," you want Intel's commercial optimizing
compilers.  Their latest 7.0 releases now offer "drop-in" replacement for GCC
(which version though???).  Intel clearly designs some nice optimizing compilers.

This is different than the kernel, where compiling a kernel for a "generic i686"
platform will run perfectly on Athlon.

> And what's the limit of what will run where?  Will -march=athlon run
> on a i686?

Definitely not!  The Pentium Pro+ pre-fetch won't be able to handle certain
codings for the Athlon.

> How about the other way around?  With either run on a i586
> CPU?

Again, we're not talking the kernel here, but actual CPU object code via the
"arch" option.

> Or a K6?  (I don't care about 386's & 486's)

The K6 is not fully Pentium ISA compatible.  Any licensing of Intel ISAs to AMD
came after the K6 design.  AMD only had full access to the Intel i486 ISA at the
time the K6 was designed.

The K6 is based on the NexGen Nx686, which is a maturity of the Nx586, which was
designed around NexGen's patented "RISC86" approach to executing the Intel i386
ISA CISC with microcoded RISC.  The Nx586 was a design effort by a small group
of engineers right after the i386 came out in 1986, but took 8 years to make it
to production silicon (1994).  

> Where's the new VIA CPU fit into this (ala some Walmart PC's)?

National Semi licensed the full intel Pentium Pro ISA, which is now be sold to
ViA (although National Semi still has a license).  So it should be just as
compatible.

> Anyone got any good links handy for reading about this stuff?  TIA!

Yeah ...  ;-p  
   http://www.google.com  

Seriously now, check out the GCC project page:
   http://gcc.gnu.org

The documentation and mailing lists will cover their optimization work in the
3.x+ branches.

-- 
Bryan J. Smith, E.I. (BSECE)       Contact Info:  http://thebs.org
[ http://thebs.org/files/resume/BryanJonSmith_certifications.pdf ]
------------------------------------------------------------------
*** Whether it is voting on butterfly ballots or driving under ***
*** overpasses, Floridians just can't seem to do things right. ***