By: David Kanter (dkanter.delete@this.realworldtech.com), November 23, 2010 11:28 am
Room: Moderated Discussions
someone (someone@somewhere.com) on 11/21/10 wrote:
---------------------------
>Richard Cownie (tich@pobox.com) on 11/21/10 wrote:
>---------------------------
>>someone (someone@somewhere.com) on 11/21/10 wrote:
>>---------------------------
>>
>>>Sure deep sub micron CMOS leaks. It leaks whether
>>>the processor is active or stalled. Other things on the
>>>chip run whether the processor is stalled or not that
>>>also consume power (PLL, global clock distribution etc).
>>>But there is no meaningful energy associated with a
>>>stall as an architectural event unless there is a replay
>>>trap etc associated with it.
>>
>>Of course there is. If the stall stops everything for N cycles, and
>>the cpu is burning power during that time, then you definitely
>>have energy usage. The stall case takes more energy than the non-stall
>>case, right ?
>
>The leakage power occurs whether the CPU is stalled
>or whether it is not stalled. Therefore it cannot be
>atributed to the stall.
>
>You could try to argue that an OOOE processor has
>fewer/shorter stalls and that reduces the amortized
>leakage energy per stall. I would counter that the
>extra complexity of OOOE means that there are a lot
>more logic transistors around to leak and so leakage
>power is always higher - whether stalled or not.. :-P
>
>>
>>>Yeah it sucks that modern workloads can't execute
>>>entirely out of L1. Let us know if you figure out a way
>>>around it.
>>
>>*If* you could execute entirely in L1, then static-scheduled in-order
>>architectures would probably be a fine idea. Since we can't, OoO
>>architectures prevail for most apps, since they cope better with
>>unpredictable load latencies. So it sucks; but it sucks a lot more
>>for your argument than for mine.
>
>Huh?
>
>We were talking about energy/power of cache misses
>and stalls. Feel free to start a different thread about
>performance.
I don't believe that OOOE is inherently less power efficient than InO. OOOE lets you overlap more cache misses, which can reduce the amount of time the process is stalled.
DK
---------------------------
>Richard Cownie (tich@pobox.com) on 11/21/10 wrote:
>---------------------------
>>someone (someone@somewhere.com) on 11/21/10 wrote:
>>---------------------------
>>
>>>Sure deep sub micron CMOS leaks. It leaks whether
>>>the processor is active or stalled. Other things on the
>>>chip run whether the processor is stalled or not that
>>>also consume power (PLL, global clock distribution etc).
>>>But there is no meaningful energy associated with a
>>>stall as an architectural event unless there is a replay
>>>trap etc associated with it.
>>
>>Of course there is. If the stall stops everything for N cycles, and
>>the cpu is burning power during that time, then you definitely
>>have energy usage. The stall case takes more energy than the non-stall
>>case, right ?
>
>The leakage power occurs whether the CPU is stalled
>or whether it is not stalled. Therefore it cannot be
>atributed to the stall.
>
>You could try to argue that an OOOE processor has
>fewer/shorter stalls and that reduces the amortized
>leakage energy per stall. I would counter that the
>extra complexity of OOOE means that there are a lot
>more logic transistors around to leak and so leakage
>power is always higher - whether stalled or not.. :-P
>
>>
>>>Yeah it sucks that modern workloads can't execute
>>>entirely out of L1. Let us know if you figure out a way
>>>around it.
>>
>>*If* you could execute entirely in L1, then static-scheduled in-order
>>architectures would probably be a fine idea. Since we can't, OoO
>>architectures prevail for most apps, since they cope better with
>>unpredictable load latencies. So it sucks; but it sucks a lot more
>>for your argument than for mine.
>
>Huh?
>
>We were talking about energy/power of cache misses
>and stalls. Feel free to start a different thread about
>performance.
I don't believe that OOOE is inherently less power efficient than InO. OOOE lets you overlap more cache misses, which can reduce the amount of time the process is stalled.
DK