By: David Kanter (dkanter.delete@this.realworldtech.com), September 18, 2012 5:20 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on September 18, 2012 2:38 pm wrote:
> The statement about power variability reminded me of POWER7's use of its deep
> trench capacitors to stabilize voltage.
>
> This brings me to ask if anyone here
> can comment on an idea I presented some time ago on comp.arch for
> micro-turboboost using capacitors to locally (temporally and spatially)
> accelerate a computation under exceptional conditions. (The example of an
> exceptional condition I gave was a carry misprediction, but more realistic
> examples might exist.) My comp.arch post received no replies and almost two
> weeks ago (5 September) I sent an email to someone at IBM research (in
> desperation or insane hope) asking about this but have not yet received a
> reply.
Generally you use capacitors to stabilize Vcc, and you want as few as possible, because they eat up a ton of area. I don't think this is a realistic idea.
> More on topic, the difference in voltage demands for memory and logic
> also brought to mind a paper which suggested sharing L1 cache among four cores.
> "Due to this higher optimal operating voltage, SRAMs remain energy efficient at
> higher supply voltages, and thus at higher speeds, compared to logic. Hence,
> there is the unique opportunity in the NTC regime to run caches faster than
> processors for energy efficiency, which naturally leads to architectures where
> multiple processors share the same first level cache." ("Reconfigurable
> Multicore Server Processors for Low Power Operation", Dreslinski et al.,
> 2009)
If you look at the papers from Intel, the difference in voltage on cache and CPU core at the optimal point are pretty small, 0.45V vs. 0.55V. Call it 20%. For what you are suggesting to work, you'd need your cache to basically run 2-4X faster than the CPU core. That's not going to happen with such a small difference in Vcc. Moreover, I suspect that this research paper was looking at ways to achieve better perf/watt subject to the contraint of 'normal circuit design', although I'm not sure.
[snip]
> I also wonder how NTV design would interact with
> asynchronous design. Asynchronous design would seem to better tolerate latency
> variability.
I'm not sure what you mean by asynchronous design. If you mean async logic that has no clock, I don't think it's relevant. NTV seems to improve efficiency by around 4X, which is far larger than the best case for async circuits.
Async logic might let you eliminate things like more robust latches, but nobody seems eager to use it given the complexity.
If you're using another definition of asynchronous design, then let me know : )
DK
> The statement about power variability reminded me of POWER7's use of its deep
> trench capacitors to stabilize voltage.
>
> This brings me to ask if anyone here
> can comment on an idea I presented some time ago on comp.arch for
> micro-turboboost using capacitors to locally (temporally and spatially)
> accelerate a computation under exceptional conditions. (The example of an
> exceptional condition I gave was a carry misprediction, but more realistic
> examples might exist.) My comp.arch post received no replies and almost two
> weeks ago (5 September) I sent an email to someone at IBM research (in
> desperation or insane hope) asking about this but have not yet received a
> reply.
Generally you use capacitors to stabilize Vcc, and you want as few as possible, because they eat up a ton of area. I don't think this is a realistic idea.
> More on topic, the difference in voltage demands for memory and logic
> also brought to mind a paper which suggested sharing L1 cache among four cores.
> "Due to this higher optimal operating voltage, SRAMs remain energy efficient at
> higher supply voltages, and thus at higher speeds, compared to logic. Hence,
> there is the unique opportunity in the NTC regime to run caches faster than
> processors for energy efficiency, which naturally leads to architectures where
> multiple processors share the same first level cache." ("Reconfigurable
> Multicore Server Processors for Low Power Operation", Dreslinski et al.,
> 2009)
If you look at the papers from Intel, the difference in voltage on cache and CPU core at the optimal point are pretty small, 0.45V vs. 0.55V. Call it 20%. For what you are suggesting to work, you'd need your cache to basically run 2-4X faster than the CPU core. That's not going to happen with such a small difference in Vcc. Moreover, I suspect that this research paper was looking at ways to achieve better perf/watt subject to the contraint of 'normal circuit design', although I'm not sure.
[snip]
> I also wonder how NTV design would interact with
> asynchronous design. Asynchronous design would seem to better tolerate latency
> variability.
I'm not sure what you mean by asynchronous design. If you mean async logic that has no clock, I don't think it's relevant. NTV seems to improve efficiency by around 4X, which is far larger than the best case for async circuits.
Async logic might let you eliminate things like more robust latches, but nobody seems eager to use it given the complexity.
If you're using another definition of asynchronous design, then let me know : )
DK



