By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 20, 2012 2:53 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on September 18, 2012 5:20 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> September 18, 2012 2:38 pm wrote:
>> The statement about power variability reminded me of POWER7's
>> use of its deep trench capacitors to stabilize voltage.
>>
>> This brings me to ask if anyone here can comment on an idea I
>> presented some time ago on comp.arch for micro-turboboost using
>> capacitors to locally (temporally and spatially) accelerate a
>> computation under exceptional conditions.
[snip]
>
> Generally you use capacitors to stabilize Vcc, and you want as few
> as possible, because they eat up a ton of area. I don't think this
> is a realistic idea.
Thank you for the answer.
>> More on topic, the difference in voltage demands for memory and
>> logic also brought to mind a paper which suggested sharing L1 cache
>> among four cores.
[snip]
>
> If you look at the papers from Intel, the difference in voltage on
> cache and CPU core at the optimal point are pretty small, 0.45V vs.
> 0.55V. Call it 20%. For what you are suggesting to work, you'd
> need your cache to basically run 2-4X faster than the CPU core.
> That's not going to happen with such a small difference in Vcc.
> Moreover, I suspect that this research paper was looking at ways to
> achieve better perf/watt subject to the contraint of 'normal circuit
> design', although I'm not sure.
As you noticed, I was not suggesting this; I was merely pointing to a paper that had suggested it. The paper does seem to be assuming 'normal circuit design', so SRAMs could be much faster.
[snip]
>> I also wonder how NTV design would interact with asynchronous
>> design. Asynchronous design would seem to better tolerate latency
>> variability.
>
> I'm not sure what you mean by asynchronous design. If
> you mean async logic that has no clock, I don't think it's
> relevant. NTV seems to improve efficiency by around 4X, which is
> far larger than the best case for async circuits.
>
> Async logic might let you eliminate things like more robust
> latches, but nobody seems eager to use it given the complexity.
Clockless (or clock-reduced) design does not seem to be a huge win (particularly relative to complexity and maturity of design and testing tools), but I was curious if the two techniques were compatible (say 25% power with NTV and 23% with NTV+async), synergistic (say 20% power with NTV+async), or conflicting (say 25% power or worse with NTV+async). Even a 5% additional benefit could be worth considerable complexity in some areas. For the simple cores imagined for NTV (at least initially), async design might be somewhat less overwhelmingly complex.
(My thinking was merely that a design technique which was more tolerant of localized timing variation might be particularly suitable for handling process variation which your article noted was a significant issue in NTV design [though the article also noted that other methods can be used to compensate].)
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> September 18, 2012 2:38 pm wrote:
>> The statement about power variability reminded me of POWER7's
>> use of its deep trench capacitors to stabilize voltage.
>>
>> This brings me to ask if anyone here can comment on an idea I
>> presented some time ago on comp.arch for micro-turboboost using
>> capacitors to locally (temporally and spatially) accelerate a
>> computation under exceptional conditions.
[snip]
>
> Generally you use capacitors to stabilize Vcc, and you want as few
> as possible, because they eat up a ton of area. I don't think this
> is a realistic idea.
Thank you for the answer.
>> More on topic, the difference in voltage demands for memory and
>> logic also brought to mind a paper which suggested sharing L1 cache
>> among four cores.
[snip]
>
> If you look at the papers from Intel, the difference in voltage on
> cache and CPU core at the optimal point are pretty small, 0.45V vs.
> 0.55V. Call it 20%. For what you are suggesting to work, you'd
> need your cache to basically run 2-4X faster than the CPU core.
> That's not going to happen with such a small difference in Vcc.
> Moreover, I suspect that this research paper was looking at ways to
> achieve better perf/watt subject to the contraint of 'normal circuit
> design', although I'm not sure.
As you noticed, I was not suggesting this; I was merely pointing to a paper that had suggested it. The paper does seem to be assuming 'normal circuit design', so SRAMs could be much faster.
[snip]
>> I also wonder how NTV design would interact with asynchronous
>> design. Asynchronous design would seem to better tolerate latency
>> variability.
>
> I'm not sure what you mean by asynchronous design. If
> you mean async logic that has no clock, I don't think it's
> relevant. NTV seems to improve efficiency by around 4X, which is
> far larger than the best case for async circuits.
>
> Async logic might let you eliminate things like more robust
> latches, but nobody seems eager to use it given the complexity.
Clockless (or clock-reduced) design does not seem to be a huge win (particularly relative to complexity and maturity of design and testing tools), but I was curious if the two techniques were compatible (say 25% power with NTV and 23% with NTV+async), synergistic (say 20% power with NTV+async), or conflicting (say 25% power or worse with NTV+async). Even a 5% additional benefit could be worth considerable complexity in some areas. For the simple cores imagined for NTV (at least initially), async design might be somewhat less overwhelmingly complex.
(My thinking was merely that a design technique which was more tolerant of localized timing variation might be particularly suitable for handling process variation which your article noted was a significant issue in NTV design [though the article also noted that other methods can be used to compensate].)



