[KLUG Members] Re: Yo! -- Self-summarizing my points ...
Bryan J. Smith
members@kalamazoolinux.org
29 Nov 2002 18:22:02 -0500
--=-I0mzOrLoTIr4ds56KaZ4
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
[ Moving to HARDWARE ]
On Fri, 2002-11-29 at 17:31, Robert G. Brown wrote:
> In his last message, Bryan essentially states that when these chips were
> bleeding edge, Intel had superiority in that round.
Well, to summarize my points:
1. Intel controlled the entire "GTL" aka Socket-7 (aka Pentium-P5)
platform
2. Intel Pentium processors had hundreds of "performance errata"
that caused it to run slower than a 486 in many areas, but
"Pentium Optimizations" that used the chips very inefficiently,
but allowed them to overcome those "performance errata." The
most popular, but so mis-understood one was the "ALU interger
load performance errata" of the Pentium that made 32-bit loads
extremely slow -- far slower than a 486, resulting in games
that ran piss-poor on the Pentium _unless_ they had these
"workarounds."
3. Competitor chips were built for executing 386/486 instructions
and "suffered" from these "Pentium Optimizations," even though
they were faster running 386/486 optimized code than the Pentium
was with Pentium Optimized code. I.e., the 386/486 version on
the AMD K6 of Quake ran _faster_ than even the Pentium-optimized
version on the Pentium, because the K6 could balance ALU and FPU
operations whereas the Pentium's ALU was basically "unused."=20
=20
4. Intel moved to the "GTL+" aka Socket-8 (Pentium Pro-P6) and
the latter off-spring (Pentium II, Pentium III - P6), which
offered greatly improved L2 and memory cache performance.
5. The AMD K6 was still on the older platform at the time that the
PPro/P2 came out, and the latter had far greater memory-I/O
throughput and lower latency due to on-package/die cache.
6. Once AMD moved to the Athlon, it adopted the "EV6" aka Slot-A
(aka "Alpha 264") bus and latter incarnations (the Socket-A/462
people commonly know today), and offered far greater memory-I/O
than even the new Intel P3. And AMD now controlled their own
platform, free of Intel's "focus."
7. The Athlon 9-issue core is far superior to the Pentium Pro's
7-issue core. It not only supports the K6's superior arithemetic
logic unit (ALU) and branch prediction unit**, but a 3-issue
floating point (FPU) that offers the ability to do 2 complex _and_
1 simple FPU instruction simultaneously. The P2-P4 can only do
1 complex _or_ 2 simple instructions simultaneously.
[ **SIDE NOTE: Intel had so much trouble designing an effective branch
prediction unit, they for went putting one in the IA-64. What the
Itanium did was use "branch predication" where it issues _both_ paths
and discards the one not taken. And as Intel found out with the 1st gen
IA-64 Itanium, it is _not_ very efficient, so the new, 2nd gen IA-64
Itanium2 features a minimal branch prediction unit and a full one will
be in the 3rd gen Itanium3 released in late 2003, early 2004. BTW, the
Digital Alpha team _predicted_ "branch predication" alone would
_fail_utterly_. Most of those engineers now work for AMD or API
Networks. ]
8. Intel had not planned on the P6 core lasting so long as to outlast
it's projected 1GHz "end of life," which results in the quick
creation of the Pentium 4. The P4 simply has the same core as the
P3, only with over twice as many stages in the pipes. They did no
complex redesign, so their are serious issues with branch
mispredicts, ALU performance and limitations versus the Athlon's
FPU. To overcome this, Intel introduced two things:
A. They adopted Rambus signaling on the FSB (even for SDRAM
Pentium 4s now) -- not a bad move
B. They adopted a 100% marketing strategy of continually
introducing new SIMD instructions with each revision,
and _purposely_ re-used opcodes of 386/486 instructions,
ensuring incompatibility with previous products --
_including_ their own! These SIMD instructions have their
own registers, executions units and other components, which
seriously _bloats_ the size of the P4 unnecessarily
9. What Intel didn't realize is that AMD can easily accomodate the
SIMD marketing fiasco by simply leveraging its unused FPU cycles.
AMD writes new microcode for each new batch of SIMD instructions.
Unfortunately, there are two issues:
A. AMD lags new Intel chip SIMD instructions by 6-9 months
B. AMD is trying to accomodate legacy code by trying to
make both SIMD and legacy 386/486 instructions execute
despite the opcode reuse (which is tricky, long story)
10. AMD is moving to 64-bit on x86. Surprisingly, they still
maintain a smaller die size than P4 (let alone IA-64!), even
though they are usually 6-12 months behind Intel in packaging
and die size. As EPIC's developers found out, 64-bit is _better_
than adding yet more and more SIMD instructions, when AMD's core
"3DNow!" does everything gamers and engineers need anyway. Hence
why you'll see their UT 2003 for the x86-64 soon and more vendors
to follow.
11. Now when it comes to x86-64, AMD isn't just extending to 64-bits.
They are moving to a localized northbridge, memory/interconnect
on-chip in the "Hammer" series (Athlon 64/Opteron). This totally
caught Intel by surprise, who thought it had sold everyone on NGIO
(serial interconnect). But while NGIO continues to be vaporware,
AMD's HyperTransport exists in _even_Intel_mainboard_chipsets_
now! Why? Digital has _always_ driven the interconnect, from=20
system-memory-CPU to PCI and AGP. HyperTransport is their idea,
and it's _extremely_scalable_ for 2-8 CPUs, and high-speed network
between those shared memory systems. Intel does _not_ have
anything like it. Hence the major cluster vendors find the
"Hammer" is even more of a killer cluster/scientific chip than
its already powerful desktop/server chip application.
13. Intel is sticking by its "32-bit is good enough for now" and their
"36-bit/64GB 'EMS-like' (with associated performance hit)
addressing over 32-bit/4GB" approach. I've needed servers and
even workstations with more than 4GB, and AMD's new Athlon 64/
Opteron kills Xeon there.
14. Between the 36-bit/64GB "performance issue" and the SIMD
nightmare, developers and integrators are beginning to _prefer_
AMD x86-64 over Intel IA-32 for the next year or two. This is
bad for Intel, who thought their IA-64 would be "mature" for the
desktop by the time IA-32 hit 1GHz when the decided to cease
major x86 core research over a half decade ago.
15. Intel _is_ getting to IA-64 and in late 2003 we will get the
server/workstation "Madison" and in early 2004 the desktop
"Deerfield" which are 3rd gen IA-64 Itanium3 chips. IA-64 was
born out of engineering ideals and flopped on its face, as
Digital's Alpha team predicted. But Intel learned, it bought
much of Digital's Alpha technology, and is adopting Alpha
design approaches in this series. It may very well best AMD's
"Hammer" which isn't scalable much past 5GHz in the coming years
-- _depending_ on whether or not AMD gains marketshare with x86-64
in 2003.
With all that said, here's the deal:
AMD Pros over Intel:
- Near-100% 386/486 opcode compatibility
- Superior IA-32 ALU, FPU performance -- RISC-based CISC core (RISC86)
- "Lossless Math" SIMD instructions (because they use FPU operations)
- HyperTransport interconnect is now in almost everything, and AMD
puts it directly on-chip in the "Hammer" series (scales far better)
- Offers 64-bit x86-64 option for legacy software compatibility
- Smaller core die size, copper interconnect with silicon-on-insulator
- 128KB L1 cache -- 4x PPro-P4 -- far more important than L2!
- x86-64 allows 64-bit application optimizations despite running on
only a 32-bit OS (like current Windows)
- Licensing agreements with IBM/Motorola for fab technology
(who are ahead of Intel in many areas -- e.g., copper+SOI)
AMD Cons versus Intel:
- Lags packaging/die improvements by 6-12 months
(still using 200mm diameter wafers, 0.09um won't be until late 2003)
- Has only 4 fabs, only 1 "leading edge" fab (Dresden, Germany), which
makes "changeovers" very difficult, leading to "poor initial yields"
(fabs cost billions to create, although AMD has licensed UMC now)
- Lags control over OEMs, not using approved components, "cheapest
model"
- OEMs totally ignoring "mainboard mount" cooling (which results
in Athlon systems not being cooled as well as P4 systems -- AMD
offers same mechanical form-factor as Intel)
- Cannot offer OEM's R&D money to offset costs (so AMD has difficulty
"break-in" to tier-1 OEMs -- sound familiar? ;-)
- Athlon/"Hammer" cores might not scale past 5GHz
- Quickly spending $1B reserves, may not survive past 2003Q1 if they
cannot sell "Hammer"
- Microsoft dragging its feat on x86-64 Windows (which "breaks" the
64GB, 4GB-performance "barrier")
Intel Pros over AMD:
- P4 scalable to 5-10GHz, IA-64 should as well
- Dozens of fabs, with almost a dozen dedicated to leading-edge
chip production (largely in Malaysia), using 300mm diameter wafers
(although just finally moved to copper, but not SOI yet)
- Will offer scalable I/O and ISA for 64-bit future in 2004+ (Itanium3)
- Controls $40B SIMD marketing engine that forces incompatibilities
in previous chips (all vendors, including Intel's own), that forces
people to upgrade for "best performance"
- Has money to offset OEM R&D, new product creation (teir-1 entrenched)
- Has tens of billions in reserves, can weather any financial storm
- Far more influence over Microsoft than API-AMD (which is why 64-bit
Windows for Alpha never surfaced despite being written by Digital
almost 6 years ago!)
Intel Cons versus AMD:
- P3/Athlon 50% faster at ALU (P4 33% slower),
100% faster at FPU (P4 50% slower)
- FPU designs continue to suck, even in IA-64 (4-year old Alpha 264
beats Itanium2, SPARC and Hammer are just as good or better)
- "Lossy Math" SIMD instructions (SIMD pipelines have "poor accuracy")
- IA-64 EPIC/Predication "ideals" proven not reality, won't be until
2004+ that 3rd gen IA-64 "Itanium3" merges them with traditional
RISC-runtime/Prediction chip-side optimizations
- Intel _continues_ to design IA-64 for "previous generation"
fabrication (i.e. Itanium2 is 0.18um, and Itanium3 will only be
0.13um at release, while P4 is moving to 0.09um).
- NGIO still not viable, AMD HyperTransport and IBM PCI-X=20
are now entrenched in system interconnect and I/O peripherials
- Microsoft dragging its feet on 64-bit Windows for IA-64 for desktops
and IA-64 does _not_ run 32-bit Windows
--=20
Bryan J. Smith, E.I. Contact Info: http://thebs.org
A+/i-Net+/Linux+/Network+/Server+ CCNA CIWA CNA SCSA/SCWSE/SCNA
---------------------------------------------------------------
The more government chooses for you, the less freedom you have.
--=-I0mzOrLoTIr4ds56KaZ4
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQA95/aaDjEszaVrzmQRAksUAJ4yRft5qxLh00JkceyVuHTQUaOFJgCbBkRD
FVzHuY13Go5aSALNbznaMbA=
=3A1A
-----END PGP SIGNATURE-----
--=-I0mzOrLoTIr4ds56KaZ4--