By: ? (, September 27, 2009 8:33 am
someone ( on 9/27/09 wrote:
>Jouni Osmala ( on 9/26/09 wrote:
>>There is multiple order of magnitude difference between special purpose hardware
>>and software configurable in terms of perf/power generally. The big difference,
>>is that instead of spending huge number of transistors selecting operations, and
>>decoding instructions looking dependencies, you just have small units that just do the work.
>>Byte addition costs~200 transistors 32bit addition ~1000 transistors. and & or are 4 transistors per bit.
>>Shift by constant known before are almost free.
>>For multiplier it approximately takes an adder sized of first operand per bit of second operand.
>This may be helpful for perspective:
>Interesting comparison of MPU energy cost today
>64 bit multiply-add - 200 pJ
>read 64 bits from cache - 800 pJ
>move 64 bits across chip - 2000 pJ
>execute an instruction - 7500 pJ
>read 64 bits from DRAM - 12000 pJ

>Notice it costs 15X more energy to go to memory than
>read data from cache. The execution of a multiply-
>add instruction burns 97% of the energy in overhead
>and 3% in the arithmetic circuits.
>Also notice it is more energy efficient to redundantly
>perform 9 separate multiply-add operations in different
>locations across a chip than do it once in one location
>and broadcast the result across the chip! (presuming
>the operands were already widely distributed).
>from IDF paper TCIS001, slide 19

Thanks, nice paper.

However (there is always another "however" with me), notice that it says it is 12000 pJ per *operation*, not per *clock*. Since reading 64 bits from DRAM takes ~100 cycles, isn't it only ~120 pJ per cycle?

Notice that the slide title is "Energy Cost by Operation Type" - which implies they mean total cost of the whole operation. However, a few lines below they multiply 12000pJ by 3GHz - which implies they mean it is 12000pJ per clock.

... now I am lost. Until this inconsistency is resolved, I take those numbers with a grain of salt.
