By: Patrick Chase (patrickjchase.delete@this.gmail.com), February 4, 2013 2:07 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on February 4, 2013 1:39 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:55 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:47 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on February 4, 2013 2:27 am wrote:
> > > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 3, 2013 2:27 pm wrote:
> > > > >
> > > > > With that said, you gave me plenty of evidence yourself. R10K was >50% bigger for a 10% advantage
> > > > > in integer and a 50% advantage in FP.
> > > >
> > > > 50% in FP is for Spec95, I assume.
> > > > In Spec2k the difference was ALOT bigger - 2.4x.
> > > > I wonder why Spec2k and spec95 produce so different pictures?
> > >
> > > Two words: Cache footprint.
> > >
> > > SpecFP95 had a notoriously small working set. SpecFP2k was better in that respect. The R14K you cite had
> > > an 8 MB external last level cache, vs. 512 KB for the PIII. That big cache gave R14K quite a significant
> > > benefit in SpecFP2k and similar technical workloads, which is precisely why SGI put it there :-).
> >
> > There is also an issue of external DRAM bandwidth. The PIII-500's chipsets used a single 64-bit SDR SDRAM
> > channel if I recall correctly. Peak STREAM bandwidth was on the order of a couple hundred MiB/sec.
> >
>
> Slightly more:
> Intel_440BX_600, ncpus=1 - 342.2/340.2/412.0/409.2
> http://www.cs.virginia.edu/stream/stream_mail/1999/0035.html
>
> > The Origin used 128-bit DDR per node (it's a NUMA), so it would have had ~4X the bandwidth
> > to memory on even a single node. Peak STREAM bandwidth was close to 1 GiB/sec.
>
> Unfortunately, I can't find single-CPU STREAM result for Origin 3200.
> The previous Origin generation is not very good in single or dual CPU mode:
> SGI_Origin2000-300, ncpus=1 - 336.0/334.0/387.0/388.0
> SGI_Origin2000-300, ncpus=2 - 383.0/373.0/414.0/422.0
> SGI_Origin2000-300, ncpus=4 - 759.0/754.0/852.0/854.0
Origin 2000 used SDR SDRAM, vs DDR in the 3000/3200.
> The smallest Origin3k on official site is a quad:
> SGI_Origin3800-400, ncpus=4 - 1400.6/1403.1/1551.5/1574.3
>
> 4-cpu score is twice higher than Origin2000-300, but I am not sure
> that we can conclude that the same ratio applies to a single CPU.
>
> >
> > As I said in my previous post, a LOT changes in the 2 years
> > between when that PIII-500 came out and when the R14K did.
> >
> > I can't believe I remember this stuff. Time to go get my brain erased...
> >
>
> So, may be you remember the characteristics of R14K FSB?
> I am afraid that the STREAM bottleneck would be at FSB rather than at memory bus.
FSB was 64 bits at 200 MHz.
See: http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=hdwr&db=bks&srch=&fname=/SGI_EndUser/Or3000_TCM/sgi_html/ch04.html
Note that the minimal orderable configuration for the 3200 was 2 CPUs.
I'm going on distant memory as to the actual STREAM bandwidth. I'm pretty sure it was less than the stated sustainable SYSAD.
-- Patrick
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:55 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 11:47 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on February 4, 2013 2:27 am wrote:
> > > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 3, 2013 2:27 pm wrote:
> > > > >
> > > > > With that said, you gave me plenty of evidence yourself. R10K was >50% bigger for a 10% advantage
> > > > > in integer and a 50% advantage in FP.
> > > >
> > > > 50% in FP is for Spec95, I assume.
> > > > In Spec2k the difference was ALOT bigger - 2.4x.
> > > > I wonder why Spec2k and spec95 produce so different pictures?
> > >
> > > Two words: Cache footprint.
> > >
> > > SpecFP95 had a notoriously small working set. SpecFP2k was better in that respect. The R14K you cite had
> > > an 8 MB external last level cache, vs. 512 KB for the PIII. That big cache gave R14K quite a significant
> > > benefit in SpecFP2k and similar technical workloads, which is precisely why SGI put it there :-).
> >
> > There is also an issue of external DRAM bandwidth. The PIII-500's chipsets used a single 64-bit SDR SDRAM
> > channel if I recall correctly. Peak STREAM bandwidth was on the order of a couple hundred MiB/sec.
> >
>
> Slightly more:
> Intel_440BX_600, ncpus=1 - 342.2/340.2/412.0/409.2
> http://www.cs.virginia.edu/stream/stream_mail/1999/0035.html
>
> > The Origin used 128-bit DDR per node (it's a NUMA), so it would have had ~4X the bandwidth
> > to memory on even a single node. Peak STREAM bandwidth was close to 1 GiB/sec.
>
> Unfortunately, I can't find single-CPU STREAM result for Origin 3200.
> The previous Origin generation is not very good in single or dual CPU mode:
> SGI_Origin2000-300, ncpus=1 - 336.0/334.0/387.0/388.0
> SGI_Origin2000-300, ncpus=2 - 383.0/373.0/414.0/422.0
> SGI_Origin2000-300, ncpus=4 - 759.0/754.0/852.0/854.0
Origin 2000 used SDR SDRAM, vs DDR in the 3000/3200.
> The smallest Origin3k on official site is a quad:
> SGI_Origin3800-400, ncpus=4 - 1400.6/1403.1/1551.5/1574.3
>
> 4-cpu score is twice higher than Origin2000-300, but I am not sure
> that we can conclude that the same ratio applies to a single CPU.
>
> >
> > As I said in my previous post, a LOT changes in the 2 years
> > between when that PIII-500 came out and when the R14K did.
> >
> > I can't believe I remember this stuff. Time to go get my brain erased...
> >
>
> So, may be you remember the characteristics of R14K FSB?
> I am afraid that the STREAM bottleneck would be at FSB rather than at memory bus.
FSB was 64 bits at 200 MHz.
See: http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=hdwr&db=bks&srch=&fname=/SGI_EndUser/Or3000_TCM/sgi_html/ch04.html
Note that the minimal orderable configuration for the 3200 was 2 CPUs.
I'm going on distant memory as to the actual STREAM bandwidth. I'm pretty sure it was less than the stated sustainable SYSAD.
-- Patrick