Article: Medfield, Intel's x86 Phone Chip
By: observer (no.thanks.delete@this.for.now), January 26, 2012 4:17 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 1/25/12 wrote:
---------------------------
>none (none@arm.com) on 1/25/12 wrote:
>---------------------------
>>Wilco (Wilco.Dijkstra@ntlworld.com) on 1/25/12 wrote:
>>---------------------------
>>[...]
>>>Running micro benchmarks like Dhrystone/CoreMark gives close to maximum power consumption.
>>
>>Hu? Neither of them makes use of floating-point or SIMD
>>(unless you have strcmp implemented as SIMD instructions in
>>your C lib). Neither of them have heavy miss traffic.
>>Neither of them completely kill branch predicition. So how
>>could they represent maximum power consumption on anything
>>but a cache-less, FP-less, bpred-less CPU?
>
>When a core is stalled on memory, it actually uses very little power due to extensive
>clock gating. With heavy branch misprediction you flush the pipeline all the time,
>which means you have only executed some of the fetched instructions. Very branchy
>but well predicted code would do the same number of fetches but execute far more
>instructions, thus using more power overall.
A predicted branch would be predicted either taken or not taken, so fetching would start working on either of those paths, right? High performance "deeply" pipelined instruction fetch costs power in my experience... not much difference in power for correct / incorrect predictions, but correct predictions reduce execution time -> lowering the energy consumption for a given task. Could you elaborate why this would be very different for the core you had in mind? Disregarding execution unit power that is.
>
>Old StrongARM figures at http://www.hpl.hp.com/hpjournal/dtj/vol9num1/vol9num1art5.pdf
>show decode+execute used 26% of total power, while caches used 60%. So getting
>as many L1 I&D hits as possible was the way to get maximum power. Dhrystone fits
>that scenario pretty well (note the paper explicitly mentions it is the maximum
>power consumption case), and I know it was used extensively at ARM for power consumption
>estimates. You've got a point about VFP/Neon not being used - that would certainly
>matter if you can issue more instructions after maxing out the L1.
>
>Wilco
---------------------------
>none (none@arm.com) on 1/25/12 wrote:
>---------------------------
>>Wilco (Wilco.Dijkstra@ntlworld.com) on 1/25/12 wrote:
>>---------------------------
>>[...]
>>>Running micro benchmarks like Dhrystone/CoreMark gives close to maximum power consumption.
>>
>>Hu? Neither of them makes use of floating-point or SIMD
>>(unless you have strcmp implemented as SIMD instructions in
>>your C lib). Neither of them have heavy miss traffic.
>>Neither of them completely kill branch predicition. So how
>>could they represent maximum power consumption on anything
>>but a cache-less, FP-less, bpred-less CPU?
>
>When a core is stalled on memory, it actually uses very little power due to extensive
>clock gating. With heavy branch misprediction you flush the pipeline all the time,
>which means you have only executed some of the fetched instructions. Very branchy
>but well predicted code would do the same number of fetches but execute far more
>instructions, thus using more power overall.
A predicted branch would be predicted either taken or not taken, so fetching would start working on either of those paths, right? High performance "deeply" pipelined instruction fetch costs power in my experience... not much difference in power for correct / incorrect predictions, but correct predictions reduce execution time -> lowering the energy consumption for a given task. Could you elaborate why this would be very different for the core you had in mind? Disregarding execution unit power that is.
>
>Old StrongARM figures at http://www.hpl.hp.com/hpjournal/dtj/vol9num1/vol9num1art5.pdf
>show decode+execute used 26% of total power, while caches used 60%. So getting
>as many L1 I&D hits as possible was the way to get maximum power. Dhrystone fits
>that scenario pretty well (note the paper explicitly mentions it is the maximum
>power consumption case), and I know it was used extensively at ARM for power consumption
>estimates. You've got a point about VFP/Neon not being used - that would certainly
>matter if you can issue more instructions after maxing out the L1.
>
>Wilco
Topic | Posted By | Date |
---|---|---|
Medfield article online | David Kanter | 2012/01/23 01:51 PM |
server error | bakaneko | 2012/01/24 03:00 AM |
Fixed | David Kanter | 2012/01/24 04:02 AM |
Fixed | Joel | 2012/01/24 07:43 AM |
Fixed | Ricardo B | 2012/01/24 11:25 AM |
Fixed | David Kanter | 2012/01/24 05:29 PM |
Fixed | Gabriele Svelto | 2012/01/24 01:07 PM |
Fixed | David Kanter | 2012/01/24 05:30 PM |
Reference platform battery life | Doug Siebert | 2012/01/24 02:03 PM |
standby time | Foo_ | 2012/01/25 06:58 AM |
standby time | Anon | 2012/01/26 03:42 AM |
standby time | Foo_ | 2012/01/26 04:02 AM |
standby time | Doug Siebert | 2012/01/26 12:39 PM |
standby time | Anon | 2012/01/26 01:22 PM |
standby time | anon | 2012/01/26 02:08 PM |
standby time | Anon | 2012/01/26 06:03 PM |
standby time | anon | 2012/01/26 08:57 PM |
standby time | anon | 2012/01/26 09:01 PM |
standby time | Anon | 2012/01/27 09:32 PM |
standby time | Doug Siebert | 2012/01/27 02:15 PM |
standby time | anon | 2012/01/27 02:41 PM |
Reference platform battery life | David Kanter | 2012/01/27 10:09 AM |
Performance analysis laughable | Wilco | 2012/01/24 03:23 PM |
Performance analysis laughable | David Kanter | 2012/01/24 05:19 PM |
Performance analysis laughable | IntelUser2000 | 2012/01/24 07:30 PM |
Performance analysis laughable | IntelUser2000 | 2012/01/24 07:32 PM |
Performance analysis laughable | David Kanter | 2012/01/24 11:34 PM |
Performance analysis laughable | IntelUser2000 | 2012/01/24 11:56 PM |
Performance analysis laughable | David Kanter | 2012/01/25 02:07 AM |
Performance analysis laughable | Alberto | 2012/01/25 12:54 PM |
Atom HT gain | Wilco | 2012/01/25 05:43 AM |
Atom HT gain | IntelUser2000 | 2012/01/25 06:53 AM |
Atom HT gain | none | 2012/01/25 07:04 AM |
Atom HT gain | IntelUser2000 | 2012/01/25 07:35 AM |
Atom HT gain | Foo_ | 2012/01/25 07:06 AM |
Performance analysis laughable | Wilco | 2012/01/24 08:21 PM |
Performance analysis laughable | David Kanter | 2012/01/24 10:13 PM |
Performance analysis laughable | Wilco | 2012/01/25 04:30 AM |
Performance analysis laughable | none | 2012/01/25 06:14 AM |
Performance analysis laughable | Wilco | 2012/01/25 07:18 AM |
Performance analysis laughable | observer | 2012/01/26 04:17 AM |
Performance analysis laughable | Wilco | 2012/01/26 06:25 AM |
Process numbers | Alberto | 2012/01/26 09:29 AM |
Performance analysis laughable | David Kanter | 2012/02/02 12:38 AM |
Performance analysis laughable | tupper | 2012/01/25 04:27 PM |
Performance analysis laughable | Linus Torvalds | 2012/01/25 08:37 PM |
Performance analysis laughable | Doug Siebert | 2012/01/26 02:12 PM |
Medfield article online | Andreas | 2012/01/25 03:10 AM |
Medfield article online | Alberto | 2012/01/25 09:44 AM |
Medfield article online | IntelUser2000 | 2012/01/25 10:24 AM |
Medfield article online | David Kanter | 2012/01/25 09:58 PM |
Medfield article online | Doug Siebert | 2012/01/26 01:20 PM |
Medfield article online | Eric | 2012/01/26 06:10 PM |
Medfield article online | Doug Siebert | 2012/01/27 02:40 PM |
64-bit | Ingeneer | 2012/01/25 09:28 AM |
64-bit | Foo_ | 2012/01/25 10:23 AM |
64-bit | Ingeneer | 2012/01/25 02:34 PM |
64-bit | Ungo | 2012/01/25 04:08 PM |
64-bit | EduardoS | 2012/01/26 12:55 PM |
Saltwell memcpy | SHK | 2012/01/26 02:41 AM |
Medfield WiFi & Bluetooth | Rob Thorpe | 2012/01/26 03:09 AM |
Medfield WiFi & Bluetooth | David Kanter | 2012/01/27 05:54 PM |
Medfield WiFi & Bluetooth | Rob Thorpe | 2012/01/28 02:22 PM |
Medfield article online (NT) | Anil | 2012/01/26 05:57 PM |
Medfield article online | Anil | 2012/01/26 06:11 PM |
Medfield article online | Mr. Camel | 2012/01/26 06:26 PM |
Medfield article online | none | 2012/01/27 01:41 AM |