[KLUG Members] Re: REDHAT: My suggestion on organizing binary CDs for x86 chip-specific optimizations

Bryan J. Smith members@kalamazoolinux.org
17 Apr 2002 11:07:45 -0400


On Wed, 2002-04-17 at 09:20, Adam Williams wrote:
> Redhat has to make money *someday*.

RedHat's "Linux" division has _always_lost_money_.  It is only their
"tools" division, formerly Cygnus (the first OSS success story founded
in 1989), keeps their "bottom line" from "dropping out."

> I can't see how maintaining more CDs would accomplish that.
> And the problem with doing something like this as 
> a LUG project is that slinging around ISOs takes bandwidth,
> bandwidth, bandwidth.....

You'd only need the "Default" CD #1 for a "general" CD.  The other two
"optional" CD #1s would be for servers, workstation, optimization,
etc...

As I previously mentioned, we're all going to run into this with
supporting x86-64 (unless you just want to continue running 32-bit x86
Linux on it? ;-).  Plus both RedHat and SuSE keep adding more and more
pre-compiled kernel flavors that it is quickly "spiraling out of
control."  How many people use all those different kernels?  Only
probably 1%.  Most only need an uniprocessor kernel for their specific
x86 series (i386, i586, i686 and Athlon).

So it's not really a "new idea" to RedHat to optimize and customize
packages.  It's just a thought that it could be much more "flexible" to
do so by putting them on their own CD for the specific architecture.

> That's ok,  I'll provide the SGI CD if anyone asks really nicely.  
> Hopefully RedHat will support XFS someday.  They sell an "enterprise" 
> version that doesn't support ACLs..... go figure.

As of 2.5.3, the Linux interfaces for ACLs and EAs are "standardized". 
Both Ext3 and XFS developments are hard at work implementing this
"unified" approach.  The second that is "solid" on 2.5.x for the XFS
team, they will back port to 2.4.x.  When that happens, don't be
surprised if RedHat supports XFS on 2.4.x.  Maybe it will coincide just
before the first 8.0 beta (assuming the current beta stays "as-is" and
becomes 7.3).

> Thank you.  This is a very important point about benchmarks.
> The *VAST* majority of them aren't worth a warm pile of dung.

I might be just a "dumb engineer," but I know I bit about computer
architecture.  Following the x86 in Intel's shadow has always been a
"2-point disadvantage" for AMD and others.  They are:

  - No control of chipset/board platform
  - No control of binary platform

The first was solved with Athlon as AMD "broke free" of the Intel
mainboard design.  The result is that AMD doesn't have to play "catch
up" because *IT* releases its own, new board/chipset designs at the same
time as its processors.  Furthermore, I'd argue the Athlon chipset/board
designs have a "compatibility lifespan" of 18-24 months verus Intel's
typical 9-15 months.

The second will be solved on a larger scale with x86-64, an ISA
extension that AMD now controls, so it is binary platform they control. 
Most binaries are Intel-optimized whcih basically means AMD
"de-optimized" as various _hacks_ (aka "Pentium optimizations") exist to
"overcome" performance issues with the Pentium (and later) designs. 
Although AMD can accommodate date them in later releases, it still
doesn't excuse the fact that the software is compile to best utilize
Intel chips, but leaves AMD chips underutilized.

My favorite example is the old "the AMD [K6] FPU sucks and isn't
pipelined" all because of an integer/non-floating point hack used in
Quake.  Another, non-performance example was the recent 4M paging issue
with Athlons.  Athlons are already very efficient at 4K paging versus
Intel, so 4M paging doesn't give you but maybe a 1% boost (but 10%+ on
Intel).  But OSes are written to use it by default, so AMD suffers.

> "Most" programs spend "most" of there time inside glibc, so this
> pays off quite well.  And most people (IMHO) don't use CPU
> intensive applications.

First off, that depends.  While you may only use Linux servers, I use
Linux _everything_.

Secondly, CPU optimizations _can_ affect I/O (even Athlon's 3DNow! and
SPARC's AV can affect networking performance!)!  Pipelines inside of a
CPU are often dedicated to thrashing data around, and scheduling various
loads, etc... While the Athlon goes to great lengths to offer "deep
buffers" to handle these, they must do so just to stay "equal" to
software that is optimized to Intel.  If they were optimized for Athlon,
you'd see 

And when I say "optimizied," I don't just mean 3DNow or whatever.  I'm
talking how the instructions are ordered in the end-user binary. 
Instruction order is the #1 performance consideration outside of cache
(and has impact on the cache too ;-).  Run-time out-of-order execution
can only do so much, as Intel has shown with IA-64.  Although they've
gone to the extreme on compiler dependency, even the Alpha 364 said that
run-time OO re-organization is _best_ combined with compile-time OO
organization in the first place.

> However!  Tuning can effect cache utilization (page alignment,
> blah blah blah) which can make a noticeable diffrence.

That's pretty simplified.  Cache is only _one_small_part_ of the
equation in architecture.

> Optimizing other libraries as well can make a nice difference.  gdbm if 
> you use OpenLDAP is a nice one (lots of hash table stuff, etc...).  Squid 
> also benefits nicely from optimization.  But I'm really suspicious of 
> claims of over ~20% improvement.   One usually rams into other barriers 
> before you can cross the 20% mark.

???

A perfect example is the Athlon FPU v. the Pentium-series FPU.  You've
got 2 full units and 1 ADD/MULT on the former, and only 1 full unit and
1 ADD on even the latest incarnation of the Pentium 4.  So most software
assumes it can only do 1 MULT simultaneously, while it can do 3 on the
Athlon!  While the Athlon's run-time OO/scheduler can handle some of
this, it's _far_from_perfect_ because it has to be done in "real-time"
(or near-real time/in the pre-fetch/decode).  It is _far_better_ to do
it at compile-time, hence why ~40% is an _average_total_performance_
change (let alone 100%+ if you are just benchmarking the actual
subroutine performance that does this).

That's just the "optimization" part.

Then we have to consider the "de-optimization" part.  The fact is that a
_crapload_of_programs_ use the FPU to load integer values to _overcome_
the crappy ALU load of the Pentium series.  If it was AMD optimized, it
wouldn't only load faster via a direct ALU load, but you wouldn't be
tying up the stupid FPU to do it!  There are a crapload of these
"Pentium optimizations" out there that are compiled into even i386
binaries _by_default_!

> These optimizations are pretty easily achieved by tweaking the spec
> files of source rpms.

Right!  I mean, what do you think I do now???  I look at a SRPM spec
file and determine if any changes are need or not, then I rebuild.

So how easy would it be for RedHat to do this on a CD re-ordering
scale???  Guys, I'm not suggesting anything "too difficult" here.  I'm
just trying to say, "hey RedHat, you've got pre-compiled kernel package
bloat and the forthcoming x86-64 platform to worry about -- why not
solve it this way?"

> My primary workstation is still a lowly PII (dual) and will be
> for the forseeable future.

Which is i686 class so it directly benefits from a few optimizations for
i686 that i586 doesn't have (various MTRRs for one).

> My servers are highly customized.  So count me out on buying one.
> I'd want the "generic" CD so I can install it anywhere.

And my point is that RedHat should move the pre-compiled kernels that
99% of people don't use but are quickly bloating out of control in
number to a separate CD.  At the same time, these new CDs can add
optimizations for newer Intel and AMD architectures, including
addressing x86-64.

-- Bryan

-- 
If consumers are liable to "correctly" license someone's IP, why aren't
IP holders held liable when they unjustly force the same consumers to
license the same IP more than once?  "Piracy" is a double-edged sword.
-----------------------------------------------------------------------
Bryan J. Smith, SmithConcepts, Inc            mailto:b.j.smith@ieee.org
Engineers and IT Professionals             http://www.SmithConcepts.com