[KLUG Members] Re: Yo! -- Self-summarizing my points ...

Bryan J. Smith members@kalamazoolinux.org
29 Nov 2002 18:22:02 -0500


--=-I0mzOrLoTIr4ds56KaZ4
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

[ Moving to HARDWARE ]

On Fri, 2002-11-29 at 17:31, Robert G. Brown wrote:
> In his last message, Bryan essentially states that when these chips were
> bleeding edge, Intel had superiority in that round.

Well, to summarize my points:

  1.  Intel controlled the entire "GTL" aka Socket-7 (aka Pentium-P5)
      platform

  2.  Intel Pentium processors had hundreds of "performance errata"
      that caused it to run slower than a 486 in many areas, but
      "Pentium Optimizations" that used the chips very inefficiently,
      but allowed them to overcome those "performance errata."  The
      most popular, but so mis-understood one was the "ALU interger
      load performance errata" of the Pentium that made 32-bit loads
      extremely slow -- far slower than a 486, resulting in games
      that ran piss-poor on the Pentium _unless_ they had these
      "workarounds."

  3.  Competitor chips were built for executing 386/486 instructions
      and "suffered" from these "Pentium Optimizations," even though
      they were faster running 386/486 optimized code than the Pentium
      was with Pentium Optimized code.  I.e., the 386/486 version on
      the AMD K6 of Quake ran _faster_ than even the Pentium-optimized
      version on the Pentium, because the K6 could balance ALU and FPU
      operations whereas the Pentium's ALU was basically "unused."=20
     =20
  4.  Intel moved to the "GTL+" aka Socket-8 (Pentium Pro-P6) and
      the latter off-spring (Pentium II, Pentium III - P6), which
      offered greatly improved L2 and memory cache performance.

  5.  The AMD K6 was still on the older platform at the time that the
      PPro/P2 came out, and the latter had far greater memory-I/O
      throughput and lower latency due to on-package/die cache.

  6.  Once AMD moved to the Athlon, it adopted the "EV6" aka Slot-A
      (aka "Alpha 264") bus and latter incarnations (the Socket-A/462
      people commonly know today), and offered far greater memory-I/O
      than even the new Intel P3.  And AMD now controlled their own
      platform, free of Intel's "focus."

  7.  The Athlon 9-issue core is far superior to the Pentium Pro's
      7-issue core.  It not only supports the K6's superior arithemetic
      logic unit (ALU) and branch prediction unit**, but a 3-issue
      floating point (FPU) that offers the ability to do 2 complex _and_
      1 simple FPU instruction simultaneously.  The P2-P4 can only do
      1 complex _or_ 2 simple instructions simultaneously.

[ **SIDE NOTE:  Intel had so much trouble designing an effective branch
prediction unit, they for went putting one in the IA-64.  What the
Itanium did was use "branch predication" where it issues _both_ paths
and discards the one not taken.  And as Intel found out with the 1st gen
IA-64 Itanium, it is _not_ very efficient, so the new, 2nd gen IA-64
Itanium2 features a minimal branch prediction unit and a full one will
be in the 3rd gen Itanium3 released in late 2003, early 2004.  BTW, the
Digital Alpha team _predicted_ "branch predication" alone would
_fail_utterly_.  Most of those engineers now work for AMD or API
Networks. ]

  8.  Intel had not planned on the P6 core lasting so long as to outlast
      it's projected 1GHz "end of life," which results in the quick
      creation of the Pentium 4.  The P4 simply has the same core as the
      P3, only with over twice as many stages in the pipes.  They did no
      complex redesign, so their are serious issues with branch
      mispredicts, ALU performance and limitations versus the Athlon's
      FPU.  To overcome this, Intel introduced two things:
         A.  They adopted Rambus signaling on the FSB (even for SDRAM
             Pentium 4s now) -- not a bad move
         B.  They adopted a 100% marketing strategy of continually
             introducing new SIMD instructions with each revision,
             and _purposely_ re-used opcodes of 386/486 instructions,
             ensuring incompatibility with previous products --
             _including_ their own!  These SIMD instructions have their
             own registers, executions units and other components, which
             seriously _bloats_ the size of the P4 unnecessarily

  9.  What Intel didn't realize is that AMD can easily accomodate the
      SIMD marketing fiasco by simply leveraging its unused FPU cycles.
      AMD writes new microcode for each new batch of SIMD instructions.
      Unfortunately, there are two issues:
         A.  AMD lags new Intel chip SIMD instructions by 6-9 months
         B.  AMD is trying to accomodate legacy code by trying to
             make both SIMD and legacy 386/486 instructions execute
             despite the opcode reuse (which is tricky, long story)

 10.  AMD is moving to 64-bit on x86.  Surprisingly, they still
      maintain a smaller die size than P4 (let alone IA-64!), even
      though they are usually 6-12 months behind Intel in packaging
      and die size.  As EPIC's developers found out, 64-bit is _better_
      than adding yet more and more SIMD instructions, when AMD's core
      "3DNow!" does everything gamers and engineers need anyway.  Hence
      why you'll see their UT 2003 for the x86-64 soon and more vendors
      to follow.

 11.  Now when it comes to x86-64, AMD isn't just extending to 64-bits.
      They are moving to a localized northbridge, memory/interconnect
      on-chip in the "Hammer" series (Athlon 64/Opteron).  This totally
      caught Intel by surprise, who thought it had sold everyone on NGIO
      (serial interconnect).  But while NGIO continues to be vaporware,
      AMD's HyperTransport exists in _even_Intel_mainboard_chipsets_
      now!  Why?  Digital has _always_ driven the interconnect, from=20
      system-memory-CPU to PCI and AGP.  HyperTransport is their idea,
      and it's _extremely_scalable_ for 2-8 CPUs, and high-speed network
      between those shared memory systems.  Intel does _not_ have
      anything like it.  Hence the major cluster vendors find the
      "Hammer" is even more of a killer cluster/scientific chip than
      its already powerful desktop/server chip application.

 13.  Intel is sticking by its "32-bit is good enough for now" and their
      "36-bit/64GB 'EMS-like' (with associated performance hit)
       addressing over 32-bit/4GB" approach.  I've needed servers and
       even workstations with more than 4GB, and AMD's new Athlon 64/
      Opteron kills Xeon there.

 14.  Between the 36-bit/64GB "performance issue" and the SIMD
      nightmare, developers and integrators are beginning to _prefer_
      AMD x86-64 over Intel IA-32 for the next year or two.  This is
      bad for Intel, who thought their IA-64 would be "mature" for the
      desktop by the time IA-32 hit 1GHz when the decided to cease
      major x86 core research over a half decade ago.

 15.  Intel _is_ getting to IA-64 and in late 2003 we will get the
      server/workstation "Madison" and in early 2004 the desktop
      "Deerfield" which are 3rd gen IA-64 Itanium3 chips.  IA-64 was
      born out of engineering ideals and flopped on its face, as
      Digital's Alpha team predicted.  But Intel learned, it bought
      much of Digital's Alpha technology, and is adopting Alpha
      design approaches in this series.  It may very well best AMD's
      "Hammer" which isn't scalable much past 5GHz in the coming years
      -- _depending_ on whether or not AMD gains marketshare with x86-64
      in 2003.

With all that said, here's the deal:

AMD Pros over Intel:
 - Near-100% 386/486 opcode compatibility
 - Superior IA-32 ALU, FPU performance -- RISC-based CISC core (RISC86)
 - "Lossless Math" SIMD instructions (because they use FPU operations)
 - HyperTransport interconnect is now in almost everything, and AMD
   puts it directly on-chip in the "Hammer" series (scales far better)
 - Offers 64-bit x86-64 option for legacy software compatibility
 - Smaller core die size, copper interconnect with silicon-on-insulator
 - 128KB L1 cache -- 4x PPro-P4 -- far more important than L2!
 - x86-64 allows 64-bit application optimizations despite running on
   only a 32-bit OS (like current Windows)
 - Licensing agreements with IBM/Motorola for fab technology
   (who are ahead of Intel in many areas -- e.g., copper+SOI)

AMD Cons versus Intel:
 - Lags packaging/die improvements by 6-12 months
   (still using 200mm diameter wafers, 0.09um won't be until late 2003)
 - Has only 4 fabs, only 1 "leading edge" fab (Dresden, Germany), which
   makes "changeovers" very difficult, leading to "poor initial yields"
  (fabs cost billions to create, although AMD has licensed UMC now)
 - Lags control over OEMs, not using approved components, "cheapest
   model"
 - OEMs totally ignoring "mainboard mount" cooling (which results
   in Athlon systems not being cooled as well as P4 systems -- AMD
   offers same mechanical form-factor as Intel)
 - Cannot offer OEM's R&D money to offset costs (so AMD has difficulty
   "break-in" to tier-1 OEMs -- sound familiar? ;-)
 - Athlon/"Hammer" cores might not scale past 5GHz
 - Quickly spending $1B reserves, may not survive past 2003Q1 if they
   cannot sell "Hammer"
 - Microsoft dragging its feat on x86-64 Windows (which "breaks" the
   64GB, 4GB-performance "barrier")

Intel Pros over AMD:
 - P4 scalable to 5-10GHz, IA-64 should as well
 - Dozens of fabs, with almost a dozen dedicated to leading-edge
   chip production (largely in Malaysia), using 300mm diameter wafers
   (although just finally moved to copper, but not SOI yet)
 - Will offer scalable I/O and ISA for 64-bit future in 2004+ (Itanium3)
 - Controls $40B SIMD marketing engine that forces incompatibilities
   in previous chips (all vendors, including Intel's own), that forces
   people to upgrade for "best performance"
 - Has money to offset OEM R&D, new product creation (teir-1 entrenched)
 - Has tens of billions in reserves, can weather any financial storm
 - Far more influence over Microsoft than API-AMD (which is why 64-bit
   Windows for Alpha never surfaced despite being written by Digital
   almost 6 years ago!)

Intel Cons versus AMD:
 - P3/Athlon 50% faster at ALU (P4 33% slower),
   100% faster at FPU (P4 50% slower)
 - FPU designs continue to suck, even in IA-64 (4-year old Alpha 264
   beats Itanium2, SPARC and Hammer are just as good or better)
 - "Lossy Math" SIMD instructions (SIMD pipelines have "poor accuracy")
 - IA-64 EPIC/Predication "ideals" proven not reality, won't be until
   2004+ that 3rd gen IA-64 "Itanium3" merges them with traditional
   RISC-runtime/Prediction chip-side optimizations
 - Intel _continues_ to design IA-64 for "previous generation"
   fabrication (i.e. Itanium2 is 0.18um, and Itanium3 will only be
   0.13um at release, while P4 is moving to 0.09um).
 - NGIO still not viable, AMD HyperTransport and IBM PCI-X=20
   are now entrenched in system interconnect and I/O peripherials
 - Microsoft dragging its feet on 64-bit Windows for IA-64 for desktops
   and IA-64 does _not_ run 32-bit Windows



--=20
Bryan J. Smith, E.I.            Contact Info:  http://thebs.org
A+/i-Net+/Linux+/Network+/Server+ CCNA CIWA CNA SCSA/SCWSE/SCNA
---------------------------------------------------------------
The more government chooses for you, the less freedom you have.

--=-I0mzOrLoTIr4ds56KaZ4
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQA95/aaDjEszaVrzmQRAksUAJ4yRft5qxLh00JkceyVuHTQUaOFJgCbBkRD
FVzHuY13Go5aSALNbznaMbA=
=3A1A
-----END PGP SIGNATURE-----

--=-I0mzOrLoTIr4ds56KaZ4--