Article: Parallelism at HotPar 2010
By: AM (myname4rwt.delete@this.jee-male.com), August 17, 2010 1:13 am
Room: Moderated Discussions
sea (sea@sea.com) on 8/16/10 wrote:
---------------------------
>Let me give one example:
>
>Software name: CUDAMCML
>
>Formal publication:
>
>"Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration"
>J. Biomed. Opt., Vol. 13, 060504 (2008); doi:10.1117/1.3041496
>
>Abstract:
>
>... Monte Carlo simulations of photon migration. In a standard simulation of time-resolved
>photon migration in a semi-infinite geometry, the proposed methodology executed
>on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation
>performed on a single standard processor. ...
>
>In their later CUDAMCML manual, their gave up the 1000X claim. Now, they claimed
>that CUDAMCML is about 50X times faster than original MCML, a 15 year old program one CPU core.
>
>Later this year, there is another publication by different researchers: "Tetrahedron-based
>inhomogeneous Monte-Carlo optical simulator." Phys. Med. Biol. 55:947-962, 2010.
>In this publication, the two researchers compared CUDAMCML with a slightly improved
>multi-thread MCML. Now the speedup is only 2 times.
>
>From 1000X to 50X to 2X, what a difference.
So different researchers, most likely different hw and different compilers, possibly different code. And what an implication.
Besides, what I see in the presumably latest cudamcml manual is a 100x rule-of-thumb speedup, with an option for further 10x boost (though they state clearly that the switch disables certain functionality), so at a first glance, the 1000x claim is still there.
In what versions of their manuals did they claim 1000x and 50x speedups? I'm not getting any hits to such figures in their manuals as you state.
>There is another paper shows 300X: "Monte Carlo simulation of photon migration
>in 3D turbid media accelerated by graphics processing units."
>
>In this paper, the authors compared a GPU program on 8800GT with CPU program on
>Intel Xeon 1.86GHz (E5120?). The 8800GT GPU has about 112 cores and the frequency
>of 8800GT is lower than 1.86GHz. Even if one GPU core is equal to one CPU core,
>the speed up is at most 112 times. I am very sure his code can be improved 10 times.
Well, regardless of whether you work for Intel, are a GPU hater for some reason, or good at making CPU code faster, why not give it a try then? Many GPU folks make their code available to everyone (and that includes cudamcml http://www.atomic.physics.lu.se/biophotonics/our_research/monte_carlo_simulations/gpu_monte_carlo/), so in case Intel doesn't like the speedups, they should try making faster CPU codes available as GPU guys do than writing "debunking" articles full of BS.
---------------------------
>Let me give one example:
>
>Software name: CUDAMCML
>
>Formal publication:
>
>"Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration"
>J. Biomed. Opt., Vol. 13, 060504 (2008); doi:10.1117/1.3041496
>
>Abstract:
>
>... Monte Carlo simulations of photon migration. In a standard simulation of time-resolved
>photon migration in a semi-infinite geometry, the proposed methodology executed
>on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation
>performed on a single standard processor. ...
>
>In their later CUDAMCML manual, their gave up the 1000X claim. Now, they claimed
>that CUDAMCML is about 50X times faster than original MCML, a 15 year old program one CPU core.
>
>Later this year, there is another publication by different researchers: "Tetrahedron-based
>inhomogeneous Monte-Carlo optical simulator." Phys. Med. Biol. 55:947-962, 2010.
>In this publication, the two researchers compared CUDAMCML with a slightly improved
>multi-thread MCML. Now the speedup is only 2 times.
>
>From 1000X to 50X to 2X, what a difference.
So different researchers, most likely different hw and different compilers, possibly different code. And what an implication.
Besides, what I see in the presumably latest cudamcml manual is a 100x rule-of-thumb speedup, with an option for further 10x boost (though they state clearly that the switch disables certain functionality), so at a first glance, the 1000x claim is still there.
In what versions of their manuals did they claim 1000x and 50x speedups? I'm not getting any hits to such figures in their manuals as you state.
>There is another paper shows 300X: "Monte Carlo simulation of photon migration
>in 3D turbid media accelerated by graphics processing units."
>
>In this paper, the authors compared a GPU program on 8800GT with CPU program on
>Intel Xeon 1.86GHz (E5120?). The 8800GT GPU has about 112 cores and the frequency
>of 8800GT is lower than 1.86GHz. Even if one GPU core is equal to one CPU core,
>the speed up is at most 112 times. I am very sure his code can be improved 10 times.
Well, regardless of whether you work for Intel, are a GPU hater for some reason, or good at making CPU code faster, why not give it a try then? Many GPU folks make their code available to everyone (and that includes cudamcml http://www.atomic.physics.lu.se/biophotonics/our_research/monte_carlo_simulations/gpu_monte_carlo/), so in case Intel doesn't like the speedups, they should try making faster CPU codes available as GPU guys do than writing "debunking" articles full of BS.