Assembler Solution from RU
;This is an optimized assembly version of the pi.c program, ; which is available at http://www.tempest-sw.com/benchmark/ ;This program REQUIRES SSE and does NOT check that it exists ;This code was written by [RU], and ; submitted to the code challenge at http://www.realworldtech.com, ; which required the program to run on P3/P4/AthlonXP, so SSE is ; *assumed* to be present. An exception WILL occur if SSE is not present
;Major changes include: ;
- – use of floating point calculations instead of integer. (this should be equally accurate until cases get very large)
- ;- a[] in the C code was removed, and q was made into an array instead. This allows a smaller array to be maintained.
- ;- A large portion of the code was unrolled to allow vectorization and better scheduling.
- – vectorizing/scheduling the remaining dependent divisions {~5% I’d guess}
- – using integer division resource when available, as right now the vectorized loop only uses the SSE division resource (between 1/16 and 1/12 I would guess) {~5-10%}
- – prefetching to speed up large cases that use main memory (speedup depends on memory, cache, cpu speed…) {???}
- – using SSE2 division to put back to integer calculations (loss of speed due to greater precision) {~-10% P4}
Pages: « Prev 1 2 3 4 5 6 7 8 9 Next »
Discuss (16 comments)