By: Patrick Chase (patrickjchase.delete@this.gmail.com), February 2, 2013 10:26 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 9:42 am wrote:
> anon (anon.delete@this.anon.com) on February 2, 2013 5:04 am wrote:
> > R10000 was introduced within a couple of months of PentiumPro.
> >
> > R10K was 4 way superscalar, 64-bit, OOOE, and excluding the larger L1
> > caches in the MIPS, the core was fewer transistors by the looks.
> >
> > On the sameish process (0.35) and date, 195MHz R10K was ~10% faster in
> > specint95 than the 200MHz PentiumPro, and ~50% faster in specfp95.
>
> Cache size has a huge impact on performance for many workloads. Architects make tradeoffs
> between core complexity and cache size all the time in order to optimize overall performance,
> and you therefore can't ignore caches when makiing comparisons. Area is area.
>
> The R10K die was 298 mm^2, P6 was 196 mm^2. R10K is 50% bigger, 10% faster for integer, and 50% faster for FP.
> If I take your claim of process equivalence at face value then that indicates that Intel had the performance-per-unit-area
> edge at that point (though I think you're wrong - Intel's design and process were better. If Intel had designed
> and fabbed the R10K it would have been significantly better than what MIPS came up with).
>
> Want to try again?
>
> Keep in mind also that most R10K installations used much larger external L2s than P6 (256K for initial
> P6 vs 256K-16MB for R10K) and that also counts against *total* area and *total* performance. Those
> big L2s helped a lot with specfp95, as that suite had notoriously small working sets...
Sorry about the repeated posting, but I dug around and found the spec95 results that anon is comparing. They're comparing the results from the Intel "Alder System" at 200 MHz (which is indeed a 0.35 um P6) in Dec-95 here:
http://www.spec.org/cpu95/results/res9512/
To the SGI Power Challenge R10000 at 195 MHz in Q4-96 here:
http://www.spec.org/cpu95/results/res96q4/
As I suspected, the external L2s are quite different. The PPro had 256KB, while the R10K had 2 MB. Note that these aren't included in the area comparison I gave above. Total area (including external caches) is MUCH higher for the R10K here.
> anon (anon.delete@this.anon.com) on February 2, 2013 5:04 am wrote:
> > R10000 was introduced within a couple of months of PentiumPro.
> >
> > R10K was 4 way superscalar, 64-bit, OOOE, and excluding the larger L1
> > caches in the MIPS, the core was fewer transistors by the looks.
> >
> > On the sameish process (0.35) and date, 195MHz R10K was ~10% faster in
> > specint95 than the 200MHz PentiumPro, and ~50% faster in specfp95.
>
> Cache size has a huge impact on performance for many workloads. Architects make tradeoffs
> between core complexity and cache size all the time in order to optimize overall performance,
> and you therefore can't ignore caches when makiing comparisons. Area is area.
>
> The R10K die was 298 mm^2, P6 was 196 mm^2. R10K is 50% bigger, 10% faster for integer, and 50% faster for FP.
> If I take your claim of process equivalence at face value then that indicates that Intel had the performance-per-unit-area
> edge at that point (though I think you're wrong - Intel's design and process were better. If Intel had designed
> and fabbed the R10K it would have been significantly better than what MIPS came up with).
>
> Want to try again?
>
> Keep in mind also that most R10K installations used much larger external L2s than P6 (256K for initial
> P6 vs 256K-16MB for R10K) and that also counts against *total* area and *total* performance. Those
> big L2s helped a lot with specfp95, as that suite had notoriously small working sets...
Sorry about the repeated posting, but I dug around and found the spec95 results that anon is comparing. They're comparing the results from the Intel "Alder System" at 200 MHz (which is indeed a 0.35 um P6) in Dec-95 here:
http://www.spec.org/cpu95/results/res9512/
To the SGI Power Challenge R10000 at 195 MHz in Q4-96 here:
http://www.spec.org/cpu95/results/res96q4/
As I suspected, the external L2s are quite different. The PPro had 256KB, while the R10K had 2 MB. Note that these aren't included in the area comparison I gave above. Total area (including external caches) is MUCH higher for the R10K here.