By: juanrga (noemail.delete@this.juanrga.com), October 29, 2016 10:15 pm
Room: Moderated Discussions
Simon Farnsworth (simon.delete@this.farnz.org.uk) on October 28, 2016 6:19 am wrote:
> juanrga (noemail.delete@this.juanrga.com) on October 28, 2016 2:02 am wrote:
> > Simon Farnsworth (simon.delete@this.farnz.org.uk) on October 25, 2016 11:03 am wrote:
> > > juanrga (noemail.delete@this.juanrga.com) on October 25, 2016 9:57 am wrote:
> > > > anon (spam.delete.delete@this.this.spam.com) on October 23, 2016 7:25 am wrote:
> > > > > juanrga (noemail.delete@this.juanrga.com) on October 23, 2016 6:09 am wrote:
> > > > > > anon (spam.delete@this.spam.com) on October 22, 2016 8:52 am wrote:
> > > > > >
> > > > > > > I mean
> > > > > >
> > > > > > > > Apple doesn’t always have the best performance per square millimeter,
> > > > > > > > writes Gwennap, but it makes up for it in efficiency per clock cycle
> > > > > >
> > > > > > > that's not how it works.
> > > > > >
> > > > > > His first claim is correct, Apple Hurricane doesn't have the best performance per area,
> > > > > > but this is expected because it is a latency-optimized core not a throughput optimized-core.
> > > > > > About his second claim if by "efficiency per clock cycle" he means IPC/Area then his claim
> > > > > > is wrong or right depending if he is comparing to Intel or to other ARM cores.
> > > > >
> > > > > My point is that perf = clockrate * ipc. Whether the ipc is high with low clockrates
> > > > > or abysmal with insane clockrates doesn't matter at all for perf/area. Same
> > > > > perf and same area mean same perf/area, regardless of the ipc.
> > > >
> > > > But he talks about "efficiency per clock cycle" which suggest he is talking about
> > > > IPC/Area, not about Perf/Area. And the superior IPC/Area of Apple chips compared
> > > > to Intel chips is related to ARM64 efficiency: the well-known "x86 tax".
> > > >
> > > > > IPC/area is nice and all but it doesn't buy you anything. I can get you tremendous IPC
> > > > > by running the core so slow that I get a RAM to register load to use latency of 1 cycle.
> > > >
> > > > The variation of IPC with clocks is very small and you can only get huge IPC gains by
> > > > setting extremely low clocks, but that is not happening here. Hurricane is clocked at
> > > > 2.34GHz. Underclocking a 4GHz Haswell chip to 2GHz increases the IPC by less than 5%.
> > > > Apple achieving IPC parity with best Intel designs is not due to lower clocks...
> > >
> > > That claim does not fit my understanding of how IPC gets
> > > exploited in real world chips. Downclocking Haswell
> > > won't increase IPC by much, because the design is for high clock rates, and thus the increased IPC from a
> > > lower clock is only available because the ratio between memory speed and processor speed is reduced.
> > >
> > > However, if you're designing to a target clock speed, you can get much higher IPC on a comparable
> > > process if your clock speed is lower than if it's higher; this is simply because if the processes
> > > are comparable, the FO4 time is comparable, but at 2 GHz, you can fit twice as many FO4 time units
> > > (thus twice as many transistors) in the critical path compared to a 4 GHz clock.
> > >
> > > Thus, for your claim to be true, either the process Apple is using is far behind Intel, such that
> > > the FO4 time is about twice that of the Intel process (so Apple get the same number of transistors
> > > in the critical path as Intel, but at half the clock speed), or Apple is leaving performance
> > > on the table, by designing for a target clock of 4 GHz, then only achieving 2 GHz, when they could
> > > achieve higher IPC and higher performance by designing around the 2 GHz target clock.
> > >
> > > Assuming that Apple aren't being idiots, and that TSMC/GloFo/Samsung
> > > processes are comparable to Intel's processes
> > > (within 20%, say), the most likely explanation is that they're
> > > getting their IPC by exploiting the longer clock
> > > cycles to run more logic per clock cycle. This, in turn,
> > > means that the chip is unlikely to scale to the same
> > > high clock speeds as an Intel chip does, because they run out of FO4 delay as the clock goes up.
> > >
> > > Equally, of course, this implies that an Intel core run at mobile speeds is leaving performance on
> > > the table - you've designed around the constraints of high speed operation, then decided to clock lower,
> > > when you could have designed for the lower clock, and had more logic running per clock cycle.
> >
> > Essentially the same rule applies upwards and downwards, only parameters vary.
> >
> > If your design is optimized for 4GHz and underclocking to 2GHz increases the IPC by less than 5%, then
> > if your design is optimized for 2GHz, overclocking it to 4GHz will reduce the IPC by a similar amount.
>
> The same rule does not apply upward and downward. If it did, I could reliably overclock
> a Pentium III to 4 GHz, just as I can downclock a modern Intel chip to 800 MHz.
>
I didn't wrote "the same" but "Essentially the same" and I mentioned explicitly that parameters vary (critical path timing, clock skew, jitter...).
> juanrga (noemail.delete@this.juanrga.com) on October 28, 2016 2:02 am wrote:
> > Simon Farnsworth (simon.delete@this.farnz.org.uk) on October 25, 2016 11:03 am wrote:
> > > juanrga (noemail.delete@this.juanrga.com) on October 25, 2016 9:57 am wrote:
> > > > anon (spam.delete.delete@this.this.spam.com) on October 23, 2016 7:25 am wrote:
> > > > > juanrga (noemail.delete@this.juanrga.com) on October 23, 2016 6:09 am wrote:
> > > > > > anon (spam.delete@this.spam.com) on October 22, 2016 8:52 am wrote:
> > > > > >
> > > > > > > I mean
> > > > > >
> > > > > > > > Apple doesn’t always have the best performance per square millimeter,
> > > > > > > > writes Gwennap, but it makes up for it in efficiency per clock cycle
> > > > > >
> > > > > > > that's not how it works.
> > > > > >
> > > > > > His first claim is correct, Apple Hurricane doesn't have the best performance per area,
> > > > > > but this is expected because it is a latency-optimized core not a throughput optimized-core.
> > > > > > About his second claim if by "efficiency per clock cycle" he means IPC/Area then his claim
> > > > > > is wrong or right depending if he is comparing to Intel or to other ARM cores.
> > > > >
> > > > > My point is that perf = clockrate * ipc. Whether the ipc is high with low clockrates
> > > > > or abysmal with insane clockrates doesn't matter at all for perf/area. Same
> > > > > perf and same area mean same perf/area, regardless of the ipc.
> > > >
> > > > But he talks about "efficiency per clock cycle" which suggest he is talking about
> > > > IPC/Area, not about Perf/Area. And the superior IPC/Area of Apple chips compared
> > > > to Intel chips is related to ARM64 efficiency: the well-known "x86 tax".
> > > >
> > > > > IPC/area is nice and all but it doesn't buy you anything. I can get you tremendous IPC
> > > > > by running the core so slow that I get a RAM to register load to use latency of 1 cycle.
> > > >
> > > > The variation of IPC with clocks is very small and you can only get huge IPC gains by
> > > > setting extremely low clocks, but that is not happening here. Hurricane is clocked at
> > > > 2.34GHz. Underclocking a 4GHz Haswell chip to 2GHz increases the IPC by less than 5%.
> > > > Apple achieving IPC parity with best Intel designs is not due to lower clocks...
> > >
> > > That claim does not fit my understanding of how IPC gets
> > > exploited in real world chips. Downclocking Haswell
> > > won't increase IPC by much, because the design is for high clock rates, and thus the increased IPC from a
> > > lower clock is only available because the ratio between memory speed and processor speed is reduced.
> > >
> > > However, if you're designing to a target clock speed, you can get much higher IPC on a comparable
> > > process if your clock speed is lower than if it's higher; this is simply because if the processes
> > > are comparable, the FO4 time is comparable, but at 2 GHz, you can fit twice as many FO4 time units
> > > (thus twice as many transistors) in the critical path compared to a 4 GHz clock.
> > >
> > > Thus, for your claim to be true, either the process Apple is using is far behind Intel, such that
> > > the FO4 time is about twice that of the Intel process (so Apple get the same number of transistors
> > > in the critical path as Intel, but at half the clock speed), or Apple is leaving performance
> > > on the table, by designing for a target clock of 4 GHz, then only achieving 2 GHz, when they could
> > > achieve higher IPC and higher performance by designing around the 2 GHz target clock.
> > >
> > > Assuming that Apple aren't being idiots, and that TSMC/GloFo/Samsung
> > > processes are comparable to Intel's processes
> > > (within 20%, say), the most likely explanation is that they're
> > > getting their IPC by exploiting the longer clock
> > > cycles to run more logic per clock cycle. This, in turn,
> > > means that the chip is unlikely to scale to the same
> > > high clock speeds as an Intel chip does, because they run out of FO4 delay as the clock goes up.
> > >
> > > Equally, of course, this implies that an Intel core run at mobile speeds is leaving performance on
> > > the table - you've designed around the constraints of high speed operation, then decided to clock lower,
> > > when you could have designed for the lower clock, and had more logic running per clock cycle.
> >
> > Essentially the same rule applies upwards and downwards, only parameters vary.
> >
> > If your design is optimized for 4GHz and underclocking to 2GHz increases the IPC by less than 5%, then
> > if your design is optimized for 2GHz, overclocking it to 4GHz will reduce the IPC by a similar amount.
>
> The same rule does not apply upward and downward. If it did, I could reliably overclock
> a Pentium III to 4 GHz, just as I can downclock a modern Intel chip to 800 MHz.
>
I didn't wrote "the same" but "Essentially the same" and I mentioned explicitly that parameters vary (critical path timing, clock skew, jitter...).
Topic | Posted By | Date |
---|---|---|
Neat die area comparison image | Rob | 2016/10/21 05:39 PM |
Neat die area comparison image | anonymou5 | 2016/10/21 06:44 PM |
Neat die area comparison image | Mr. Camel | 2016/10/22 04:58 AM |
Neat die area comparison image | Heikki Kultala | 2016/10/22 05:19 AM |
Neat die area comparison image | Mr. Camel | 2016/10/22 07:10 AM |
Neat die area comparison image | Mr. Camel | 2016/10/22 07:15 AM |
different caches... | Heikki Kultala | 2016/10/22 08:29 AM |
Broadwell includes LLC, just for comparision | anon | 2016/10/22 08:52 AM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/23 06:09 AM |
Broadwell includes LLC, just for comparision | anon | 2016/10/23 07:25 AM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/25 09:57 AM |
Broadwell includes LLC, just for comparision | Simon Farnsworth | 2016/10/25 11:03 AM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/28 02:02 AM |
Broadwell includes LLC, just for comparision | anon | 2016/10/28 04:13 AM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/29 09:47 PM |
Broadwell includes LLC, just for comparision | Travis | 2016/10/30 06:34 PM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/31 04:35 AM |
Broadwell includes LLC, just for comparision | Simon Farnsworth | 2016/10/31 04:42 AM |
Broadwell includes LLC, just for comparision | anon | 2016/11/01 12:56 PM |
Broadwell includes LLC, just for comparision | Maynard Handley | 2016/11/01 01:37 PM |
Broadwell includes LLC, just for comparision | anon | 2016/11/01 04:22 PM |
Broadwell includes LLC, just for comparision | Maynard Handley | 2016/11/01 07:30 PM |
Broadwell includes LLC, just for comparision | anon | 2016/11/02 06:15 AM |
Broadwell includes LLC, just for comparision | Maynard Handley | 2016/11/02 09:23 AM |
Broadwell includes LLC, just for comparision | anon | 2016/11/02 11:50 AM |
Broadwell includes LLC, just for comparision | Simon Farnsworth | 2016/11/02 02:48 AM |
Broadwell includes LLC, just for comparision | Simon Farnsworth | 2016/10/28 06:19 AM |
Broadwell includes LLC, just for comparision | juanrga | 2016/10/29 10:15 PM |
Broadwell includes LLC, just for comparision | Simon Farnsworth | 2016/10/30 12:31 PM |
Broadwell includes LLC, just for comparision | Ricardo B | 2016/10/29 05:30 PM |
underclocked is different than designed for low clock speed | Heikki Kultala | 2016/10/25 11:47 PM |
underclocked is different than designed for low clock speed | Maynard Handley | 2016/10/26 10:07 AM |
That wasn't the point | juanrga | 2016/10/28 02:15 AM |
Even without the point you have invalid comparison | Heikki Kultala | 2016/10/28 09:03 AM |
8 wide vs 6 wide | juanrga | 2016/10/29 10:41 PM |
8 wide vs 6 wide | Wilco | 2016/10/30 05:00 AM |
8 wide vs 6 wide | Doug S | 2016/10/30 12:20 PM |
8 wide vs 6 wide | Wilco | 2016/10/30 01:12 PM |
8 wide vs 6 wide | juanrga | 2016/10/30 02:56 PM |
8 wide vs 6 wide | Travis | 2016/10/30 07:13 PM |
8 wide vs 6 wide | juanrga | 2016/10/31 04:55 AM |
8 wide vs 6 wide | anon | 2016/11/01 01:00 PM |
SoftMachines | none | 2016/11/02 03:57 AM |
SoftMachines | David Kanter | 2016/11/02 08:53 AM |
8 wide vs 6 wide | juanrga | 2016/11/03 12:35 PM |
8 wide vs 6 wide | Wilco | 2016/11/03 02:13 PM |
8 wide vs 6 wide | juanrga | 2016/11/03 07:35 PM |
8 wide vs 6 wide | Wilco | 2016/11/04 01:27 PM |
8 wide vs 6 wide | juanrga | 2016/11/04 06:08 PM |
8 wide vs 6 wide | Wilco | 2016/11/06 04:52 AM |
8 wide vs 6 wide | juanrga | 2016/11/06 04:56 PM |
8 wide vs 6 wide | Wilco | 2016/11/07 04:25 AM |
8 wide vs 6 wide | Aaron Spink | 2016/11/04 04:08 PM |
8 wide vs 6 wide | juanrga | 2016/11/04 06:10 PM |
Dunning-Krueger effect | Heikki Kultala | 2016/11/04 03:22 AM |
Dunning-Krueger effect | itsmydamnation | 2016/11/04 02:48 PM |
8 wide vs 6 wide | anon | 2016/11/04 03:38 AM |
8 wide vs 6 wide | juanrga | 2016/11/04 05:05 AM |
8 wide vs 6 wide | anon | 2016/11/04 06:12 AM |
8 wide vs 6 wide | Wilco | 2016/11/04 01:12 PM |
8 wide vs 6 wide | anon | 2016/11/04 02:54 PM |
8 wide vs 6 wide | juanrga | 2016/11/04 05:34 PM |
8 wide vs 6 wide | anon | 2016/11/05 02:14 AM |
8 wide vs 6 wide | juanrga | 2016/11/04 05:39 PM |
8 wide vs 6 wide | Wilco | 2016/11/06 05:15 AM |
8 wide vs 6 wide | juanrga | 2016/11/06 05:06 PM |
8 wide vs 6 wide | Wilco | 2016/11/07 03:45 AM |
8 wide vs 6 wide | David Kanter | 2016/11/07 08:43 PM |
8 wide vs 6 wide | Wilco | 2016/11/08 03:57 AM |
8 wide vs 6 wide | juanrga | 2016/11/14 12:12 PM |
8 wide vs 6 wide | Wilco | 2016/11/14 04:53 PM |
8 wide vs 6 wide | dmcq | 2016/11/15 03:17 AM |
8 wide vs 6 wide | Wilco | 2016/11/15 03:43 AM |
8 wide vs 6 wide | dmcq | 2016/11/15 04:28 AM |
1 µop per instruction is not necessary | Paul A. Clayton | 2016/11/17 12:09 PM |
8 wide vs 6 wide | juanrga | 2016/11/20 06:56 AM |
8 wide vs 6 wide | Wilco | 2016/11/21 05:54 PM |
8 wide vs 6 wide | juanrga | 2016/11/22 08:49 AM |
8 wide vs 6 wide | Wilco | 2016/11/22 03:25 PM |
8 wide vs 6 wide | Wilco | 2016/10/31 03:03 AM |
Skylake can retire 8 uops | David Kanter | 2016/10/31 12:41 AM |
Skylake can retire 8 uops | juanrga | 2016/10/31 04:15 AM |
Skylake can retire 8 uops | Alberto | 2016/11/04 07:22 AM |
8 wide vs 6 wide bogus numbers | Heikki Kultala | 2016/10/30 06:25 AM |
Broadwell includes LLC, just for comparision | anon | 2016/10/26 03:10 AM |
Pushing the hidden agenda | juanrga | 2016/10/28 03:11 AM |
Pushing the hidden agenda | anon | 2016/10/28 04:35 AM |
Neat die area comparison image | David Hess | 2016/10/22 01:26 PM |
Neat die area comparison image | anon2 | 2016/10/22 05:20 PM |
Neat die area comparison image | David Hess | 2016/10/22 10:31 PM |
Neat die area comparison image | anon2 | 2016/10/23 01:50 AM |
Neat die area comparison image | Travis | 2016/10/24 01:26 PM |
Neat die area comparison image | Maynard Handley | 2016/10/24 04:27 PM |
Neat die area comparison image | juanrga | 2016/10/25 10:02 AM |
Neat die area comparison image | David Hess | 2016/10/25 09:59 PM |
Neat die area comparison image | Travis | 2016/10/25 10:22 PM |
Neat die area comparison image | David Hess | 2016/10/25 10:37 PM |
Neat die area comparison image | Travis | 2016/10/30 06:09 PM |
Neat die area comparison image | Gabriele Svelto | 2016/10/26 02:23 AM |
Neat die area comparison image | Doug S | 2016/10/26 08:17 AM |
Neat die area comparison image | Jukka Larja | 2016/10/27 09:28 AM |
Neat die area comparison image | anon | 2016/10/26 03:32 AM |
Neat die area comparison image | juanrga | 2016/10/23 06:29 AM |
Neat die area comparison image | Matthias Waldhauer | 2016/10/22 06:12 AM |
Neat die area comparison image | juanrga | 2016/10/23 05:44 AM |
Neat die area comparison image | Gabriele Svelto | 2016/10/24 02:17 AM |