Article: Parallelism at HotPar 2010
By: Vincent Diepeveen (diep.delete@this.xs4all.nl), August 18, 2010 2:28 am
Room: Moderated Discussions
Please realize what Monte Carlo means.
In itself it's randomly searching leaves and backtracking results as a result of that is not deterministic.
So to use monte carlo as the basis of a parallel test is asking for problems.
In search in general, Monte Carlo is very inefficient.
It's a beginners algorithm, though it does work.
Yet random search in itself to use as a benchmark is asking for problems. Note that most use it in an embarrassingly parallel manner, which is even more terrible inefficient.
Vincent
sea (sea@sea.com) on 8/16/10 wrote:
---------------------------
>Let me give one example:
>
>Software name: CUDAMCML
>
>Formal publication:
>
>"Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration"
>J. Biomed. Opt., Vol. 13, 060504 (2008); doi:10.1117/1.3041496
>
>Abstract:
>
>... Monte Carlo simulations of photon migration. In a standard simulation of time-resolved
>photon migration in a semi-infinite geometry, the proposed methodology executed
>on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation
>performed on a single standard processor. ...
>
>In their later CUDAMCML manual, their gave up the 1000X claim. Now, they claimed
>that CUDAMCML is about 50X times faster than original MCML, a 15 year old program one CPU core.
>
>Later this year, there is another publication by different researchers: "Tetrahedron-based
>inhomogeneous Monte-Carlo optical simulator." Phys. Med. Biol. 55:947-962, 2010.
>In this publication, the two researchers compared CUDAMCML with a slightly improved
>multi-thread MCML. Now the speedup is only 2 times.
>
>From 1000X to 50X to 2X, what a difference.
>
>There is another paper shows 300X: "Monte Carlo simulation of photon migration
>in 3D turbid media accelerated by graphics processing units."
>
>In this paper, the authors compared a GPU program on 8800GT with CPU program on
>Intel Xeon 1.86GHz (E5120?). The 8800GT GPU has about 112 cores and the frequency
>of 8800GT is lower than 1.86GHz. Even if one GPU core is equal to one CPU core,
>the speed up is at most 112 times. I am very sure his code can be improved 10 times.
>
>
>
In itself it's randomly searching leaves and backtracking results as a result of that is not deterministic.
So to use monte carlo as the basis of a parallel test is asking for problems.
In search in general, Monte Carlo is very inefficient.
It's a beginners algorithm, though it does work.
Yet random search in itself to use as a benchmark is asking for problems. Note that most use it in an embarrassingly parallel manner, which is even more terrible inefficient.
Vincent
sea (sea@sea.com) on 8/16/10 wrote:
---------------------------
>Let me give one example:
>
>Software name: CUDAMCML
>
>Formal publication:
>
>"Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration"
>J. Biomed. Opt., Vol. 13, 060504 (2008); doi:10.1117/1.3041496
>
>Abstract:
>
>... Monte Carlo simulations of photon migration. In a standard simulation of time-resolved
>photon migration in a semi-infinite geometry, the proposed methodology executed
>on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation
>performed on a single standard processor. ...
>
>In their later CUDAMCML manual, their gave up the 1000X claim. Now, they claimed
>that CUDAMCML is about 50X times faster than original MCML, a 15 year old program one CPU core.
>
>Later this year, there is another publication by different researchers: "Tetrahedron-based
>inhomogeneous Monte-Carlo optical simulator." Phys. Med. Biol. 55:947-962, 2010.
>In this publication, the two researchers compared CUDAMCML with a slightly improved
>multi-thread MCML. Now the speedup is only 2 times.
>
>From 1000X to 50X to 2X, what a difference.
>
>There is another paper shows 300X: "Monte Carlo simulation of photon migration
>in 3D turbid media accelerated by graphics processing units."
>
>In this paper, the authors compared a GPU program on 8800GT with CPU program on
>Intel Xeon 1.86GHz (E5120?). The 8800GT GPU has about 112 cores and the frequency
>of 8800GT is lower than 1.86GHz. Even if one GPU core is equal to one CPU core,
>the speed up is at most 112 times. I am very sure his code can be improved 10 times.
>
>
>