By: Michael S (already5chosen.delete@this.yahoo.com), February 4, 2013 1:39 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:55 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:47 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on February 4, 2013 2:27 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 3, 2013 2:27 pm wrote:
> > > >
> > > > With that said, you gave me plenty of evidence yourself. R10K was >50% bigger for a 10% advantage
> > > > in integer and a 50% advantage in FP.
> > >
> > > 50% in FP is for Spec95, I assume.
> > > In Spec2k the difference was ALOT bigger - 2.4x.
> > > I wonder why Spec2k and spec95 produce so different pictures?
> >
> > Two words: Cache footprint.
> >
> > SpecFP95 had a notoriously small working set. SpecFP2k was better in that respect. The R14K you cite had
> > an 8 MB external last level cache, vs. 512 KB for the PIII. That big cache gave R14K quite a significant
> > benefit in SpecFP2k and similar technical workloads, which is precisely why SGI put it there :-).
>
> There is also an issue of external DRAM bandwidth. The PIII-500's chipsets used a single 64-bit SDR SDRAM
> channel if I recall correctly. Peak STREAM bandwidth was on the order of a couple hundred MiB/sec.
>
Slightly more:
Intel_440BX_600, ncpus=1 - 342.2/340.2/412.0/409.2
http://www.cs.virginia.edu/stream/stream_mail/1999/0035.html
> The Origin used 128-bit DDR per node (it's a NUMA), so it would have had ~4X the bandwidth
> to memory on even a single node. Peak STREAM bandwidth was close to 1 GiB/sec.
Unfortunately, I can't find single-CPU STREAM result for Origin 3200.
The previous Origin generation is not very good in single or dual CPU mode:
SGI_Origin2000-300, ncpus=1 - 336.0/334.0/387.0/388.0
SGI_Origin2000-300, ncpus=2 - 383.0/373.0/414.0/422.0
SGI_Origin2000-300, ncpus=4 - 759.0/754.0/852.0/854.0
The smallest Origin3k on official site is a quad:
SGI_Origin3800-400, ncpus=4 - 1400.6/1403.1/1551.5/1574.3
4-cpu score is twice higher than Origin2000-300, but I am not sure that we can conclude that the same ratio applies to a single CPU.
>
> As I said in my previous post, a LOT changes in the 2 years
> between when that PIII-500 came out and when the R14K did.
>
> I can't believe I remember this stuff. Time to go get my brain erased...
>
So, may be you remember the characteristics of R14K FSB?
I am afraid that the STREAM bottleneck would be at FSB rather than at memory bus.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:47 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on February 4, 2013 2:27 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 3, 2013 2:27 pm wrote:
> > > >
> > > > With that said, you gave me plenty of evidence yourself. R10K was >50% bigger for a 10% advantage
> > > > in integer and a 50% advantage in FP.
> > >
> > > 50% in FP is for Spec95, I assume.
> > > In Spec2k the difference was ALOT bigger - 2.4x.
> > > I wonder why Spec2k and spec95 produce so different pictures?
> >
> > Two words: Cache footprint.
> >
> > SpecFP95 had a notoriously small working set. SpecFP2k was better in that respect. The R14K you cite had
> > an 8 MB external last level cache, vs. 512 KB for the PIII. That big cache gave R14K quite a significant
> > benefit in SpecFP2k and similar technical workloads, which is precisely why SGI put it there :-).
>
> There is also an issue of external DRAM bandwidth. The PIII-500's chipsets used a single 64-bit SDR SDRAM
> channel if I recall correctly. Peak STREAM bandwidth was on the order of a couple hundred MiB/sec.
>
Slightly more:
Intel_440BX_600, ncpus=1 - 342.2/340.2/412.0/409.2
http://www.cs.virginia.edu/stream/stream_mail/1999/0035.html
> The Origin used 128-bit DDR per node (it's a NUMA), so it would have had ~4X the bandwidth
> to memory on even a single node. Peak STREAM bandwidth was close to 1 GiB/sec.
Unfortunately, I can't find single-CPU STREAM result for Origin 3200.
The previous Origin generation is not very good in single or dual CPU mode:
SGI_Origin2000-300, ncpus=1 - 336.0/334.0/387.0/388.0
SGI_Origin2000-300, ncpus=2 - 383.0/373.0/414.0/422.0
SGI_Origin2000-300, ncpus=4 - 759.0/754.0/852.0/854.0
The smallest Origin3k on official site is a quad:
SGI_Origin3800-400, ncpus=4 - 1400.6/1403.1/1551.5/1574.3
4-cpu score is twice higher than Origin2000-300, but I am not sure that we can conclude that the same ratio applies to a single CPU.
>
> As I said in my previous post, a LOT changes in the 2 years
> between when that PIII-500 came out and when the R14K did.
>
> I can't believe I remember this stuff. Time to go get my brain erased...
>
So, may be you remember the characteristics of R14K FSB?
I am afraid that the STREAM bottleneck would be at FSB rather than at memory bus.