By: anon (anon.delete@this.anon.com), February 2, 2013 5:04 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
>
> I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> because the penalty is highly variable depending on what performance level you're
> targeting and what sort of microarchitecture you have to use to get there.
>
> As a simple example, x86 is utterly noncompetitive at the area/performance/power levels of, say, a Cortex
> M1/M3/M4 or even an R4. We saw the same thing in the mid 80s (compare MIPS R3K to 80386 at similar area,
> or R3K to 80486 at similar performance) and we see it now in the low end of the embedded market. x86 has
> traditionally been forced to rely on microcode and to forego micropipelining (as opposed to functional unit
> level pipelining a la 80386) to hit such low area targets, and that kills performance. RISCs win the area
> comparison by significant integer factors in that regime - It took Intel's million-transitor 80486 to catch
> up to an R3000 system (including caches and FPU) that totalled a few hundred thousand transistors.
>
> x86 starts to be marginally competitive once you get to dual-issue in-order superscalars (P5 vs. contemporaries
> in the late 80s; Atom vs. A8/A9 [*] today etc). The "x86 penalty" becomes fairly trivial once you get
> up to full-blown out-of-order Tomasulo machines and the like. We saw that with P6 vs. contemporaries,
R10000 was introduced within a couple of months of PentiumPro.
R10K was 4 way superscalar, 64-bit, OOOE, and excluding the larger L1 caches in the MIPS, the core was fewer transistors by the looks.
On the sameish process (0.35) and date, 195MHz R10K was ~10% faster in specint95 than the 200MHz PentiumPro, and ~50% faster in specfp95.
> David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
>
> I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> because the penalty is highly variable depending on what performance level you're
> targeting and what sort of microarchitecture you have to use to get there.
>
> As a simple example, x86 is utterly noncompetitive at the area/performance/power levels of, say, a Cortex
> M1/M3/M4 or even an R4. We saw the same thing in the mid 80s (compare MIPS R3K to 80386 at similar area,
> or R3K to 80486 at similar performance) and we see it now in the low end of the embedded market. x86 has
> traditionally been forced to rely on microcode and to forego micropipelining (as opposed to functional unit
> level pipelining a la 80386) to hit such low area targets, and that kills performance. RISCs win the area
> comparison by significant integer factors in that regime - It took Intel's million-transitor 80486 to catch
> up to an R3000 system (including caches and FPU) that totalled a few hundred thousand transistors.
>
> x86 starts to be marginally competitive once you get to dual-issue in-order superscalars (P5 vs. contemporaries
> in the late 80s; Atom vs. A8/A9 [*] today etc). The "x86 penalty" becomes fairly trivial once you get
> up to full-blown out-of-order Tomasulo machines and the like. We saw that with P6 vs. contemporaries,
R10000 was introduced within a couple of months of PentiumPro.
R10K was 4 way superscalar, 64-bit, OOOE, and excluding the larger L1 caches in the MIPS, the core was fewer transistors by the looks.
On the sameish process (0.35) and date, 195MHz R10K was ~10% faster in specint95 than the 200MHz PentiumPro, and ~50% faster in specfp95.