By: jp (jipe4153.delete@this.gmail.com), July 30, 2012 2:24 am
Room: Moderated Discussions
none (none.delete@this.none.com) on July 29, 2012 9:05 am wrote:
> aaron spink (aaronspink.delete@this.notearthlink.net) on July 29, 2012 5:03 am
> wrote:
> [...]
> > You think that GPUs are using 250 GB/s when they are
> >
> struggling to hit 50% effective flops on simple algorithms?
>
> It depends on
> what you call "simple". daxpy requires 2 LD / 1 ST for 1 FMA. So one C2050
> being 515 G FMA/s according to Wikipedia, I'd say it's memory bandwidth limited
> on daxpy.
Yeah I decided not to waste time on trying to explain to this guy the difference between being compute bound or bandwidth bound, he thinks he gets it but actually no :)
> aaron spink (aaronspink.delete@this.notearthlink.net) on July 29, 2012 5:03 am
> wrote:
> [...]
> > You think that GPUs are using 250 GB/s when they are
> >
> struggling to hit 50% effective flops on simple algorithms?
>
> It depends on
> what you call "simple". daxpy requires 2 LD / 1 ST for 1 FMA. So one C2050
> being 515 G FMA/s according to Wikipedia, I'd say it's memory bandwidth limited
> on daxpy.
Yeah I decided not to waste time on trying to explain to this guy the difference between being compute bound or bandwidth bound, he thinks he gets it but actually no :)



