By: Michael S (already5chosen.delete@this.yahoo.com), July 26, 2012 2:33 am
Room: Moderated Discussions
TacoBell (no.delete@this.spam.com) on July 25, 2012 8:36 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on July 25, 2012 1:37 am
> wrote:
> > New computational efficiency data shows GPUs with a clear edge over
> CPUs, but
> > the gap is narrowing as CPUs adopt wide vectors (e.g. AVX).
> Surprisingly, a
> > throughput CPU is the most energy efficient processor,
> offering hope for future
> > architectures. Our data also shows some
> advantages of AMD's Bulldozer, and the
> > overhead associated with highly
> scalable server CPUs.
> >
> > Comments and feedback
> >
> welcome!
>
> Bulldozer shares the FP units on a pair of cores while Ivybridge has
> dedicated FP units per core. However Bulldozer has DP FMA while Ivybridge does
> not. This means that if the processors ran at the same clockspeed the peak
> theoretical DP FLOPS of Bulldozer and Ivybridge should be the same for the same
> number of cores. Yet Bulldozer comes with twice the number of cores (8x) than
> Ivybridge (4x). So I really cannot understand the performance numbers.
Single Buldozer FPU delivers 8 DP FLOPs/Hz
2x - due to FMAC
2x - due to two FMAC pipes
2x - due to 128-bit data paths
Single SandyB/IvyB FPU delivers 8 DP FLOPs/Hz
2x - due to independent FMUL and FADD pipes
4x - due to 256-bit data paths
So, at the same clock frequency, quad-core SandyB/IvyB delivers the same number of peak DP FLOPs as octa-"core" Zambezi.
> David Kanter (dkanter.delete@this.realworldtech.com) on July 25, 2012 1:37 am
> wrote:
> > New computational efficiency data shows GPUs with a clear edge over
> CPUs, but
> > the gap is narrowing as CPUs adopt wide vectors (e.g. AVX).
> Surprisingly, a
> > throughput CPU is the most energy efficient processor,
> offering hope for future
> > architectures. Our data also shows some
> advantages of AMD's Bulldozer, and the
> > overhead associated with highly
> scalable server CPUs.
> >
> > Comments and feedback
> >
> welcome!
>
> Bulldozer shares the FP units on a pair of cores while Ivybridge has
> dedicated FP units per core. However Bulldozer has DP FMA while Ivybridge does
> not. This means that if the processors ran at the same clockspeed the peak
> theoretical DP FLOPS of Bulldozer and Ivybridge should be the same for the same
> number of cores. Yet Bulldozer comes with twice the number of cores (8x) than
> Ivybridge (4x). So I really cannot understand the performance numbers.
Single Buldozer FPU delivers 8 DP FLOPs/Hz
2x - due to FMAC
2x - due to two FMAC pipes
2x - due to 128-bit data paths
Single SandyB/IvyB FPU delivers 8 DP FLOPs/Hz
2x - due to independent FMUL and FADD pipes
4x - due to 256-bit data paths
So, at the same clock frequency, quad-core SandyB/IvyB delivers the same number of peak DP FLOPs as octa-"core" Zambezi.



