By: dmcq (dmcq.delete@this.fano.co.uk), February 11, 2015 5:44 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on February 10, 2015 2:57 pm wrote:
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 10, 2015 12:24 pm wrote:
> > > > I'd don't know what changed within ARM application processor
> > > > group, but recently they have very high success
> > > > rate: A7, A12, A53, A17 are all pretty good.
> > >
> > > Focus on incremental improvement maybe? A8, A9 and A15 were all big jumps with significantly different
> > > micro-architectures than their predecessors. All of the ones you listed on the other hand have evolved
> > > from previous cores. They have better branch prediction, tighter coupling of the L1/L2 caches and
> > > other improvements which do not significantly affect the execution core yet provide very significant
> > > benefits, and especially so in integer codes. Most of those improvements can also be shared between
> > > the cores so that might also have helped out in focusing development efforts.
> >
> > Those small tweaks matter a lot. The difference between nehalem and cortex A15 is mostly in
> > those tweaks. Cortex has wider execution stage, but nehalem is more tweaked and has better memory
> > subsystem and maybe better branch predictor,and fetches more instructions per cycle. By wider
> > core I mean nehalem's 3 compute pipelines vs cortex 6 compute pipelines, and both have same
> > number of loads/stores per cycle. A15 scheduler contains more operations than nehalems.
>
> You can't compare OoO cores based on unified scheduler (Nehalem) with cores based on split scheduler
> (Cortex-A15) in such simplistic manner. If you want to compare CA15 with x86, it would make much
> more sense to compare to another split-scheduler design like AMD K8. K8 can dispatch up to 9 uOPs
> per clock, one more than CA15, but real difference in in dispatch width is even bigger in favor
> of K8 is even bigger, because its dispatch ports are more universal. Most importantly, K8 capable
> to dispatch up to 3 integer ALU/shift instructions per clock (matching fat Intel cores by this metric)
> or resolve to 3 branches while CA15 can only issue 2 integer ALU/shift instructions per clock and
> resolve 1 branch. So, both cores feature split schedulers, but K8 is "less split".
>
> For reference, CA15 OoO schedulers (clusters, in ARM terms) and dispatch rates per scheduler:
> 1. Simple ALU/shift, 2
> 2. Branch, 1
> 3. Neon/FPU, 2
> 4. Multiply, 1 (also handles integer divide)
> 5. LSU, 2, 1 load, 1 store
>
> BTW, it's still not clear to me where store data is coming from. Does store unit has 3 read ports (2 for address
> and one for data) into register file and result queue or it somehow steals read port from another EU
>
> Also, on non-related note, when speaking about width you can't totally ignore in-order
> front end and in-order retirement parts of the core, both of each on CA15 (3 simple
> uOps) are narrower then on both Nehalem (4 fused uOps) and K8 (3 macro-ops).
>
Well for the ARM A15 a slide says the store operations are issued in order and issue when the address registers are available - not the data. So I guess they are really just generating the address and the actual store is done separately.
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 10, 2015 12:24 pm wrote:
> > > > I'd don't know what changed within ARM application processor
> > > > group, but recently they have very high success
> > > > rate: A7, A12, A53, A17 are all pretty good.
> > >
> > > Focus on incremental improvement maybe? A8, A9 and A15 were all big jumps with significantly different
> > > micro-architectures than their predecessors. All of the ones you listed on the other hand have evolved
> > > from previous cores. They have better branch prediction, tighter coupling of the L1/L2 caches and
> > > other improvements which do not significantly affect the execution core yet provide very significant
> > > benefits, and especially so in integer codes. Most of those improvements can also be shared between
> > > the cores so that might also have helped out in focusing development efforts.
> >
> > Those small tweaks matter a lot. The difference between nehalem and cortex A15 is mostly in
> > those tweaks. Cortex has wider execution stage, but nehalem is more tweaked and has better memory
> > subsystem and maybe better branch predictor,and fetches more instructions per cycle. By wider
> > core I mean nehalem's 3 compute pipelines vs cortex 6 compute pipelines, and both have same
> > number of loads/stores per cycle. A15 scheduler contains more operations than nehalems.
>
> You can't compare OoO cores based on unified scheduler (Nehalem) with cores based on split scheduler
> (Cortex-A15) in such simplistic manner. If you want to compare CA15 with x86, it would make much
> more sense to compare to another split-scheduler design like AMD K8. K8 can dispatch up to 9 uOPs
> per clock, one more than CA15, but real difference in in dispatch width is even bigger in favor
> of K8 is even bigger, because its dispatch ports are more universal. Most importantly, K8 capable
> to dispatch up to 3 integer ALU/shift instructions per clock (matching fat Intel cores by this metric)
> or resolve to 3 branches while CA15 can only issue 2 integer ALU/shift instructions per clock and
> resolve 1 branch. So, both cores feature split schedulers, but K8 is "less split".
>
> For reference, CA15 OoO schedulers (clusters, in ARM terms) and dispatch rates per scheduler:
> 1. Simple ALU/shift, 2
> 2. Branch, 1
> 3. Neon/FPU, 2
> 4. Multiply, 1 (also handles integer divide)
> 5. LSU, 2, 1 load, 1 store
>
> BTW, it's still not clear to me where store data is coming from. Does store unit has 3 read ports (2 for address
> and one for data) into register file and result queue or it somehow steals read port from another EU
>
> Also, on non-related note, when speaking about width you can't totally ignore in-order
> front end and in-order retirement parts of the core, both of each on CA15 (3 simple
> uOps) are narrower then on both Nehalem (4 fused uOps) and K8 (3 macro-ops).
>
Well for the ARM A15 a slide says the store operations are issued in order and issue when the address registers are available - not the data. So I guess they are really just generating the address and the actual store is done separately.
Topic | Posted By | Date |
---|---|---|
ARM announces A72 | Maynard Handley | 2015/02/03 11:36 AM |
ARM announces A72 | anon | 2015/02/03 12:53 PM |
ARM announces A72 | Hugo Décharnes | 2015/02/03 01:20 PM |
ARM announces A72 | juanrga | 2015/02/03 04:15 PM |
ARM announces A72 | Wilco | 2015/02/04 12:58 AM |
ARM announces A72 | Eric Bron | 2015/02/04 01:48 AM |
ARM announces A72 | none | 2015/02/04 02:24 AM |
ARM announces A72 | Eric Bron | 2015/02/04 02:42 AM |
ARM announces A72 | Exophase | 2015/02/04 07:01 AM |
ARM announces A72 | Anon | 2015/02/04 07:35 AM |
ARM announces A72 | Exophase | 2015/02/04 07:58 AM |
ARM announces A72 | Groo | 2015/02/04 09:24 AM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 10:51 AM |
ARM Marketing, BS up to my ears | Maynard Handley | 2015/02/04 01:59 PM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 02:21 PM |
ARM Marketing, BS up to my ears | Groo | 2015/02/04 02:30 PM |
ARM announces A72 | juanrga | 2015/02/04 04:23 AM |
ARM announces A72 | Wilco | 2015/02/04 03:01 PM |
ARM announces A72 | juanrga | 2015/02/04 04:06 PM |
ARM announces A72 | Anon | 2015/02/04 01:28 AM |
ARM announces A72 | juanrga | 2015/02/04 04:31 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 06:49 AM |
ARM announces A72 | Ronald Maas | 2015/02/03 07:23 PM |
ARM announces A72 | Seni | 2015/02/04 12:19 AM |
ARM announces A72 | Maynard Handley | 2015/02/04 10:42 AM |
ARM announces A72 | Seni | 2015/02/04 12:33 PM |
ARM announces A72 | dmcq | 2015/02/04 12:57 PM |
ARM announces A72 | Ronald Maas | 2015/02/04 06:42 PM |
ARM announces A72 | anon | 2015/02/04 05:19 AM |
ARM announces A72 | Exophase | 2015/02/04 07:31 AM |
ARM announces A72 | David Kanter | 2015/02/04 10:25 AM |
ARM announces A72 | Exophase | 2015/02/04 01:33 PM |
ARM announces A72 | anon | 2015/02/04 10:27 PM |
ARM announces A72 (fixed format) | anon | 2015/02/04 10:29 PM |
ARM announces A72 | Exophase | 2015/02/04 11:11 PM |
ARM announces A72 | anon | 2015/02/05 12:02 AM |
ARM announces A72 | anon | 2015/02/04 05:57 PM |
ARM announces A72 | Wilco | 2015/02/03 01:39 PM |
ARM announces A72 | Maynard Handley | 2015/02/03 02:13 PM |
ARM announces A72 | anon | 2015/02/03 02:29 PM |
ARM announces A72 | Wilco | 2015/02/03 02:44 PM |
ARM announces A72 | David Kanter | 2015/02/04 09:56 AM |
ARM announces A72 | Peter Greenhalgh | 2015/02/04 10:56 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 11:59 AM |
ARM announces A72 | Alberto | 2015/02/07 10:22 AM |
ARM announces A72 | Exophase | 2015/02/07 10:47 AM |
ARM announces A72 | Alberto | 2015/02/07 12:44 PM |
ARM announces A72 | Exophase | 2015/02/07 02:35 PM |
ARM announces A72 | Alberto | 2015/02/08 01:09 AM |
ARM announces A72 | Exophase | 2015/02/08 11:05 AM |
ARM announces A72 | David Kanter | 2015/02/08 12:39 AM |
ARM announces A72 | dmcq | 2015/02/08 04:14 AM |
ARM announces A72 | Michael S | 2015/02/08 04:38 AM |
ARM announces A72 | Gabriele Svelto | 2015/02/10 05:11 AM |
ARM announces A72 | Jouni Osmala | 2015/02/10 11:24 AM |
slit vs unified | Michael S | 2015/02/10 01:57 PM |
slit vs unified | dmcq | 2015/02/11 05:44 AM |
ARM announces A72 | Doug S | 2015/02/08 09:00 AM |
ARM announces A72 | Exophase | 2015/02/08 10:57 AM |
ARM announces A72 | dmcq | 2015/02/04 01:10 PM |
ARM announces A72 | David Kanter | 2015/02/04 02:28 PM |
ARM announces A72 | Wilco | 2015/02/04 01:59 PM |
ARM announces A72 | Aaron Spink | 2015/02/04 09:31 PM |
Intel 32nm vs 14 nm | Michael S | 2015/02/05 01:03 AM |
Intel 32nm vs 14 nm | Wilco | 2015/02/05 02:27 AM |
Intel 32nm vs 14 nm | David Kanter | 2015/02/05 09:05 AM |
Intel 32nm vs 14 nm | carop | 2015/02/05 11:12 AM |
Normalize to drawn or effective width? | David Kanter | 2015/02/05 11:45 AM |
Normalize to drawn or effective width? | carop | 2015/02/05 02:40 PM |
Normalize to drawn or effective width? | David Kanter | 2015/02/06 12:44 PM |