Article: Parallelism at HotPar 2010
By: Aaron Spink (aaronspink.delete@this.notearthlink.net), August 3, 2010 1:17 pm
Room: Moderated Discussions
Richard Cownie (tich@pobox.com) on 8/3/10 wrote:
---------------------------
>Mark Roulo (nothanks@xxx.com) on 8/2/10 wrote:
>---------------------------
>
>It gives you 20 separate L1 texture caches with an
>aggregate bandwidth of 1TB/sec; and L2 caches which
>can supply 435GB/sec.
>
>Compare against this for Nehalem:
>
>... showing cache bandwidth of about 12GB/sec (though
>maybe you could boost that using multiple threads ?)
>
32B * 3.33 * 6 = 639.36 GB/s for Westmere L2 bandwidth. Man that sure makes those number for 5870 look bad. I bet I can find a benchmark for the 5870 where it gets even worse performance then I can post some comparison where CPUs have 6x the memory bandwidth...
and as long as no one knows the difference between theoretical and peak, no one will be the wiser.
---------------------------
>Mark Roulo (nothanks@xxx.com) on 8/2/10 wrote:
>---------------------------
>
>It gives you 20 separate L1 texture caches with an
>aggregate bandwidth of 1TB/sec; and L2 caches which
>can supply 435GB/sec.
>
>Compare against this for Nehalem:
>
>... showing cache bandwidth of about 12GB/sec (though
>maybe you could boost that using multiple threads ?)
>
32B * 3.33 * 6 = 639.36 GB/s for Westmere L2 bandwidth. Man that sure makes those number for 5870 look bad. I bet I can find a benchmark for the 5870 where it gets even worse performance then I can post some comparison where CPUs have 6x the memory bandwidth...
and as long as no one knows the difference between theoretical and peak, no one will be the wiser.