Article: Parallelism at HotPar 2010
By: Michael S (already5chosen.delete@this.yahoo.com), August 18, 2010 5:33 am
Room: Moderated Discussions
AM (myname4rwt@jee-male.com) on 8/18/10 wrote:
---------------------------
>Michael S (already5chosen@yahoo.com) on 8/17/10 wrote:
>---------------------------
>>AM (myname4rwt@jee-male.com) on 8/17/10 wrote:
>>---------------------------
>>>anon (anon@anon.com) on 8/16/10 wrote:
>>>---------------------------
>>>>AM (myname4rwt@jee-male.com) on 8/16/10 wrote:
>>>>---------------------------
>>>>>anon (anon@anon.com) on 8/14/10 wrote:
>>>>>---------------------------
>>>>>>So in summary, nobody has been able to come up with a credible paper proving these
>>>>>>fantastic 100x performance gains despite being absolutely certain the claims are real.
>>>>>
>>>>>In summary, claims that GPUs' perf advantage is limited to 2.5x-5x are complete
>>>>>and utter BS,
>>>>
>>>>AM the claim, actually, is coming from the 100x-1000x people. People here are doubting
>>>>that claim because of the numbers involved, but it is not up to them to disprove
>>>
>>>As a matter of fact, some people here (to be precise, Mark Roulo) asserted that
>>>the "real" perf advantage of GPUs over CPUs is 2.5x-5x. Hence my suggestion to him
>>>(and David Kanter wrt his own statements) to show how can it possibly be true with
>>>a selection of papers as small as 10. Haven't seen anyone of them in the thread since.
>>
>>I'd like to hear what is your own claim?
>>Say, on CPU side i5-760 (4 cores, 2.8 GHz, 8MB L3, no HT, official price=$205),
>>multithreaded , SIMD-optimized (not by compiler).
>
>If we allow hand optimization for the CPU, it's obvious that the same should apply
>to the GPU side. Or rely on compiler in both cases.
>
Did NVidea disclose the their ISA and provided the tools required for hand optimization? Methinks not.
On the other hand, hand-optimization for x86 SIMD is very well supported by plenty of tools (compiler intrinsics, debuggers, profilers). In short, hand-optimizing below standard C/Fortran on Nehalem is practical and done in practice by tens of thousands of devs. Hand-optimizing below CUDA on NVidea is not practical and likely not even possible for non-NVidea devs.
>>On GPU side best NVidea can offer under $250 (official price, not random internet link).
>
>I wonder what you mean by the official price. AFAIK Nvidia doesn't have some kind
>of publicly available price list; I think all they do is [sometime] quote target
>prices for complete products in PRs at the time of release, but needless to say, they change over time.
>
>Besides, LGA 1156 is a cost adder that should be taken into account as well (after
>all, I agreed with correction for CPU price). Which is $30-$40 as of today, cheap vs cheap.
Not sure about it. LGA 1366 is a serious cost adder. LGA 1156 is already close to parity with LGA 775 and the gap continues to shrink.
>
>>Calculation should not take advantage of texture interpolation capabilities (using
>>texture cache is, not only allowed but highly desireble) of GPU since, first, it's
>>extremely rare in non-3D-rendering code and as such non-representative, second,
>>when it happens everybody, including David Kanter and Mark Roulo probably agree that >50x speedup is possible.
>>So what speedup do you expect under conditions like above?
>
>Why are you saying interpolation is extremely rare? Whenever we solve numerically
>a problem which simulates something by representing the reality in discretized form,
>we can't do without interpolation as long as we want our functions to be at least C0-continuous.
>
Do you realize that texture unit does interpolation on low-precision fix-point numbers? It's not good enough even for sonar/radar beam forming. Applicability to traditional HPC is extremely rare. Except, may be, final visualization phase, but that better handled by classic GPU APIs rather than GPGPU.
>Besides, making good use of texture-interpolation hw was specifically emphasized
>in one of the papers Intel selected to "debunk" the "myth", but all they did is
>take some other Monte-Carlo code instead for comparison.
>
>As for your questions re. my claim/expectation,
>1) I joined when I saw the comment by Kanter which seems to not be supported with any research/work whatsoever;
>2) if we are to take a closer look at i5-760 vs GPU, then we need to clear possible
>disagreements first anyway (hence my comments above);
>3) so far the only article which was clearly shown to be totally misleading is the one by Lee et al.
>
>http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1
>
>"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU"
---------------------------
>Michael S (already5chosen@yahoo.com) on 8/17/10 wrote:
>---------------------------
>>AM (myname4rwt@jee-male.com) on 8/17/10 wrote:
>>---------------------------
>>>anon (anon@anon.com) on 8/16/10 wrote:
>>>---------------------------
>>>>AM (myname4rwt@jee-male.com) on 8/16/10 wrote:
>>>>---------------------------
>>>>>anon (anon@anon.com) on 8/14/10 wrote:
>>>>>---------------------------
>>>>>>So in summary, nobody has been able to come up with a credible paper proving these
>>>>>>fantastic 100x performance gains despite being absolutely certain the claims are real.
>>>>>
>>>>>In summary, claims that GPUs' perf advantage is limited to 2.5x-5x are complete
>>>>>and utter BS,
>>>>
>>>>AM the claim, actually, is coming from the 100x-1000x people. People here are doubting
>>>>that claim because of the numbers involved, but it is not up to them to disprove
>>>
>>>As a matter of fact, some people here (to be precise, Mark Roulo) asserted that
>>>the "real" perf advantage of GPUs over CPUs is 2.5x-5x. Hence my suggestion to him
>>>(and David Kanter wrt his own statements) to show how can it possibly be true with
>>>a selection of papers as small as 10. Haven't seen anyone of them in the thread since.
>>
>>I'd like to hear what is your own claim?
>>Say, on CPU side i5-760 (4 cores, 2.8 GHz, 8MB L3, no HT, official price=$205),
>>multithreaded , SIMD-optimized (not by compiler).
>
>If we allow hand optimization for the CPU, it's obvious that the same should apply
>to the GPU side. Or rely on compiler in both cases.
>
Did NVidea disclose the their ISA and provided the tools required for hand optimization? Methinks not.
On the other hand, hand-optimization for x86 SIMD is very well supported by plenty of tools (compiler intrinsics, debuggers, profilers). In short, hand-optimizing below standard C/Fortran on Nehalem is practical and done in practice by tens of thousands of devs. Hand-optimizing below CUDA on NVidea is not practical and likely not even possible for non-NVidea devs.
>>On GPU side best NVidea can offer under $250 (official price, not random internet link).
>
>I wonder what you mean by the official price. AFAIK Nvidia doesn't have some kind
>of publicly available price list; I think all they do is [sometime] quote target
>prices for complete products in PRs at the time of release, but needless to say, they change over time.
>
>Besides, LGA 1156 is a cost adder that should be taken into account as well (after
>all, I agreed with correction for CPU price). Which is $30-$40 as of today, cheap vs cheap.
Not sure about it. LGA 1366 is a serious cost adder. LGA 1156 is already close to parity with LGA 775 and the gap continues to shrink.
>
>>Calculation should not take advantage of texture interpolation capabilities (using
>>texture cache is, not only allowed but highly desireble) of GPU since, first, it's
>>extremely rare in non-3D-rendering code and as such non-representative, second,
>>when it happens everybody, including David Kanter and Mark Roulo probably agree that >50x speedup is possible.
>>So what speedup do you expect under conditions like above?
>
>Why are you saying interpolation is extremely rare? Whenever we solve numerically
>a problem which simulates something by representing the reality in discretized form,
>we can't do without interpolation as long as we want our functions to be at least C0-continuous.
>
Do you realize that texture unit does interpolation on low-precision fix-point numbers? It's not good enough even for sonar/radar beam forming. Applicability to traditional HPC is extremely rare. Except, may be, final visualization phase, but that better handled by classic GPU APIs rather than GPGPU.
>Besides, making good use of texture-interpolation hw was specifically emphasized
>in one of the papers Intel selected to "debunk" the "myth", but all they did is
>take some other Monte-Carlo code instead for comparison.
>
>As for your questions re. my claim/expectation,
>1) I joined when I saw the comment by Kanter which seems to not be supported with any research/work whatsoever;
>2) if we are to take a closer look at i5-760 vs GPU, then we need to clear possible
>disagreements first anyway (hence my comments above);
>3) so far the only article which was clearly shown to be totally misleading is the one by Lee et al.
>
>http://portal.acm.org/citation.cfm?id=1816021&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222&ret=1
>
>"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU"