By: Michael S (already5chosen.delete@this.yahoo.com), February 10, 2015 1:57 pm
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 10, 2015 12:24 pm wrote:
> > > I'd don't know what changed within ARM application processor
> > > group, but recently they have very high success
> > > rate: A7, A12, A53, A17 are all pretty good.
> >
> > Focus on incremental improvement maybe? A8, A9 and A15 were all big jumps with significantly different
> > micro-architectures than their predecessors. All of the ones you listed on the other hand have evolved
> > from previous cores. They have better branch prediction, tighter coupling of the L1/L2 caches and
> > other improvements which do not significantly affect the execution core yet provide very significant
> > benefits, and especially so in integer codes. Most of those improvements can also be shared between
> > the cores so that might also have helped out in focusing development efforts.
>
> Those small tweaks matter a lot. The difference between nehalem and cortex A15 is mostly in
> those tweaks. Cortex has wider execution stage, but nehalem is more tweaked and has better memory
> subsystem and maybe better branch predictor,and fetches more instructions per cycle. By wider
> core I mean nehalem's 3 compute pipelines vs cortex 6 compute pipelines, and both have same
> number of loads/stores per cycle. A15 scheduler contains more operations than nehalems.
You can't compare OoO cores based on unified scheduler (Nehalem) with cores based on split scheduler (Cortex-A15) in such simplistic manner. If you want to compare CA15 with x86, it would make much more sense to compare to another split-scheduler design like AMD K8. K8 can dispatch up to 9 uOPs per clock, one more than CA15, but real difference in in dispatch width is even bigger in favor of K8 is even bigger, because its dispatch ports are more universal. Most importantly, K8 capable to dispatch up to 3 integer ALU/shift instructions per clock (matching fat Intel cores by this metric) or resolve to 3 branches while CA15 can only issue 2 integer ALU/shift instructions per clock and resolve 1 branch. So, both cores feature split schedulers, but K8 is "less split".
For reference, CA15 OoO schedulers (clusters, in ARM terms) and dispatch rates per scheduler:
1. Simple ALU/shift, 2
2. Branch, 1
3. Neon/FPU, 2
4. Multiply, 1 (also handles integer divide)
5. LSU, 2, 1 load, 1 store
BTW, it's still not clear to me where store data is coming from. Does store unit has 3 read ports (2 for address and one for data) into register file and result queue or it somehow steals read port from another EU
Also, on non-related note, when speaking about width you can't totally ignore in-order front end and in-order retirement parts of the core, both of each on CA15 (3 simple uOps) are narrower then on both Nehalem (4 fused uOps) and K8 (3 macro-ops).
> > > I'd don't know what changed within ARM application processor
> > > group, but recently they have very high success
> > > rate: A7, A12, A53, A17 are all pretty good.
> >
> > Focus on incremental improvement maybe? A8, A9 and A15 were all big jumps with significantly different
> > micro-architectures than their predecessors. All of the ones you listed on the other hand have evolved
> > from previous cores. They have better branch prediction, tighter coupling of the L1/L2 caches and
> > other improvements which do not significantly affect the execution core yet provide very significant
> > benefits, and especially so in integer codes. Most of those improvements can also be shared between
> > the cores so that might also have helped out in focusing development efforts.
>
> Those small tweaks matter a lot. The difference between nehalem and cortex A15 is mostly in
> those tweaks. Cortex has wider execution stage, but nehalem is more tweaked and has better memory
> subsystem and maybe better branch predictor,and fetches more instructions per cycle. By wider
> core I mean nehalem's 3 compute pipelines vs cortex 6 compute pipelines, and both have same
> number of loads/stores per cycle. A15 scheduler contains more operations than nehalems.
You can't compare OoO cores based on unified scheduler (Nehalem) with cores based on split scheduler (Cortex-A15) in such simplistic manner. If you want to compare CA15 with x86, it would make much more sense to compare to another split-scheduler design like AMD K8. K8 can dispatch up to 9 uOPs per clock, one more than CA15, but real difference in in dispatch width is even bigger in favor of K8 is even bigger, because its dispatch ports are more universal. Most importantly, K8 capable to dispatch up to 3 integer ALU/shift instructions per clock (matching fat Intel cores by this metric) or resolve to 3 branches while CA15 can only issue 2 integer ALU/shift instructions per clock and resolve 1 branch. So, both cores feature split schedulers, but K8 is "less split".
For reference, CA15 OoO schedulers (clusters, in ARM terms) and dispatch rates per scheduler:
1. Simple ALU/shift, 2
2. Branch, 1
3. Neon/FPU, 2
4. Multiply, 1 (also handles integer divide)
5. LSU, 2, 1 load, 1 store
BTW, it's still not clear to me where store data is coming from. Does store unit has 3 read ports (2 for address and one for data) into register file and result queue or it somehow steals read port from another EU
Also, on non-related note, when speaking about width you can't totally ignore in-order front end and in-order retirement parts of the core, both of each on CA15 (3 simple uOps) are narrower then on both Nehalem (4 fused uOps) and K8 (3 macro-ops).
Topic | Posted By | Date |
---|---|---|
ARM announces A72 | Maynard Handley | 2015/02/03 11:36 AM |
ARM announces A72 | anon | 2015/02/03 12:53 PM |
ARM announces A72 | Hugo Décharnes | 2015/02/03 01:20 PM |
ARM announces A72 | juanrga | 2015/02/03 04:15 PM |
ARM announces A72 | Wilco | 2015/02/04 12:58 AM |
ARM announces A72 | Eric Bron | 2015/02/04 01:48 AM |
ARM announces A72 | none | 2015/02/04 02:24 AM |
ARM announces A72 | Eric Bron | 2015/02/04 02:42 AM |
ARM announces A72 | Exophase | 2015/02/04 07:01 AM |
ARM announces A72 | Anon | 2015/02/04 07:35 AM |
ARM announces A72 | Exophase | 2015/02/04 07:58 AM |
ARM announces A72 | Groo | 2015/02/04 09:24 AM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 10:51 AM |
ARM Marketing, BS up to my ears | Maynard Handley | 2015/02/04 01:59 PM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 02:21 PM |
ARM Marketing, BS up to my ears | Groo | 2015/02/04 02:30 PM |
ARM announces A72 | juanrga | 2015/02/04 04:23 AM |
ARM announces A72 | Wilco | 2015/02/04 03:01 PM |
ARM announces A72 | juanrga | 2015/02/04 04:06 PM |
ARM announces A72 | Anon | 2015/02/04 01:28 AM |
ARM announces A72 | juanrga | 2015/02/04 04:31 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 06:49 AM |
ARM announces A72 | Ronald Maas | 2015/02/03 07:23 PM |
ARM announces A72 | Seni | 2015/02/04 12:19 AM |
ARM announces A72 | Maynard Handley | 2015/02/04 10:42 AM |
ARM announces A72 | Seni | 2015/02/04 12:33 PM |
ARM announces A72 | dmcq | 2015/02/04 12:57 PM |
ARM announces A72 | Ronald Maas | 2015/02/04 06:42 PM |
ARM announces A72 | anon | 2015/02/04 05:19 AM |
ARM announces A72 | Exophase | 2015/02/04 07:31 AM |
ARM announces A72 | David Kanter | 2015/02/04 10:25 AM |
ARM announces A72 | Exophase | 2015/02/04 01:33 PM |
ARM announces A72 | anon | 2015/02/04 10:27 PM |
ARM announces A72 (fixed format) | anon | 2015/02/04 10:29 PM |
ARM announces A72 | Exophase | 2015/02/04 11:11 PM |
ARM announces A72 | anon | 2015/02/05 12:02 AM |
ARM announces A72 | anon | 2015/02/04 05:57 PM |
ARM announces A72 | Wilco | 2015/02/03 01:39 PM |
ARM announces A72 | Maynard Handley | 2015/02/03 02:13 PM |
ARM announces A72 | anon | 2015/02/03 02:29 PM |
ARM announces A72 | Wilco | 2015/02/03 02:44 PM |
ARM announces A72 | David Kanter | 2015/02/04 09:56 AM |
ARM announces A72 | Peter Greenhalgh | 2015/02/04 10:56 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 11:59 AM |
ARM announces A72 | Alberto | 2015/02/07 10:22 AM |
ARM announces A72 | Exophase | 2015/02/07 10:47 AM |
ARM announces A72 | Alberto | 2015/02/07 12:44 PM |
ARM announces A72 | Exophase | 2015/02/07 02:35 PM |
ARM announces A72 | Alberto | 2015/02/08 01:09 AM |
ARM announces A72 | Exophase | 2015/02/08 11:05 AM |
ARM announces A72 | David Kanter | 2015/02/08 12:39 AM |
ARM announces A72 | dmcq | 2015/02/08 04:14 AM |
ARM announces A72 | Michael S | 2015/02/08 04:38 AM |
ARM announces A72 | Gabriele Svelto | 2015/02/10 05:11 AM |
ARM announces A72 | Jouni Osmala | 2015/02/10 11:24 AM |
slit vs unified | Michael S | 2015/02/10 01:57 PM |
slit vs unified | dmcq | 2015/02/11 05:44 AM |
ARM announces A72 | Doug S | 2015/02/08 09:00 AM |
ARM announces A72 | Exophase | 2015/02/08 10:57 AM |
ARM announces A72 | dmcq | 2015/02/04 01:10 PM |
ARM announces A72 | David Kanter | 2015/02/04 02:28 PM |
ARM announces A72 | Wilco | 2015/02/04 01:59 PM |
ARM announces A72 | Aaron Spink | 2015/02/04 09:31 PM |
Intel 32nm vs 14 nm | Michael S | 2015/02/05 01:03 AM |
Intel 32nm vs 14 nm | Wilco | 2015/02/05 02:27 AM |
Intel 32nm vs 14 nm | David Kanter | 2015/02/05 09:05 AM |
Intel 32nm vs 14 nm | carop | 2015/02/05 11:12 AM |
Normalize to drawn or effective width? | David Kanter | 2015/02/05 11:45 AM |
Normalize to drawn or effective width? | carop | 2015/02/05 02:40 PM |
Normalize to drawn or effective width? | David Kanter | 2015/02/06 12:44 PM |