By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 18, 2012 2:38 pm
Room: Moderated Discussions
The statement about power variability reminded me of POWER7's use of its deep trench capacitors to stabilize voltage.
This brings me to ask if anyone here can comment on an idea I presented some time ago on comp.arch for micro-turboboost using capacitors to locally (temporally and spatially) accelerate a computation under exceptional conditions. (The example of an exceptional condition I gave was a carry misprediction, but more realistic examples might exist.) My comp.arch post received no replies and almost two weeks ago (5 September) I sent an email to someone at IBM research (in desperation or insane hope) asking about this but have not yet received a reply.
More on topic, the difference in voltage demands for memory and logic also brought to mind a paper which suggested sharing L1 cache among four cores. "Due to this higher optimal operating voltage, SRAMs remain energy efficient at higher supply voltages, and thus at higher speeds, compared to logic. Hence, there is the unique opportunity in the NTC regime to run caches faster than processors for energy efficiency, which naturally leads to architectures where multiple processors share the same first level cache." ("Reconfigurable Multicore Server Processors for Low Power Operation", Dreslinski et al., 2009)
Somewhat related to variable precision FP arithmetic, if software could handle approximate results, some additional power savings could be gained. Even just dropping the requirement for 0.5ULP rounding could save some power.
The area, power, and performance tradeoffs could also be interesting in memory-bandwidth (or latency) limited workloads. Increasing area would allow more off-chip interconnect/bandwidth and matching processing speed more closely to memory speed would have energy efficiency benefits with less downside for memory-bound workloads.
I also wonder how NTV design would interact with asynchronous design. Asynchronous design would seem to better tolerate latency variability.
This brings me to ask if anyone here can comment on an idea I presented some time ago on comp.arch for micro-turboboost using capacitors to locally (temporally and spatially) accelerate a computation under exceptional conditions. (The example of an exceptional condition I gave was a carry misprediction, but more realistic examples might exist.) My comp.arch post received no replies and almost two weeks ago (5 September) I sent an email to someone at IBM research (in desperation or insane hope) asking about this but have not yet received a reply.
More on topic, the difference in voltage demands for memory and logic also brought to mind a paper which suggested sharing L1 cache among four cores. "Due to this higher optimal operating voltage, SRAMs remain energy efficient at higher supply voltages, and thus at higher speeds, compared to logic. Hence, there is the unique opportunity in the NTC regime to run caches faster than processors for energy efficiency, which naturally leads to architectures where multiple processors share the same first level cache." ("Reconfigurable Multicore Server Processors for Low Power Operation", Dreslinski et al., 2009)
Somewhat related to variable precision FP arithmetic, if software could handle approximate results, some additional power savings could be gained. Even just dropping the requirement for 0.5ULP rounding could save some power.
The area, power, and performance tradeoffs could also be interesting in memory-bandwidth (or latency) limited workloads. Increasing area would allow more off-chip interconnect/bandwidth and matching processing speed more closely to memory speed would have energy efficiency benefits with less downside for memory-bound workloads.
I also wonder how NTV design would interact with asynchronous design. Asynchronous design would seem to better tolerate latency variability.



