By: anon (anon.delete@this.anon.com), February 4, 2015 10:27 pm
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on February 4, 2015 2:33 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on February 4, 2015 11:25 am wrote:
> > Memory disambiguation would be most useful with another load unit.
> >
> > I would do another load unit. I don't think it's very helpful to do 2 ST/clock,
> > especially since it makes your store buffer a lot nastier to deal with.
> >
>
> I don't think it's a coincidence that Core 2 both increased width across the board and added memory
> disambiguation, all while not adding a second store port until two uarch generations later. Although
> I do of course agree that disambiguation would be more effective with a second load port.
>
> I have written lots of code which has a higher than 1:2 load to ALU density for decent sized chunks or loop
> bodies, so I do think a second load unit would help a lot. Even more so if they're adding a third ALU.
Silvermont does not do memory disambiguation, despite Intel having perhaps more experience than anyone with it. Although it is their first gen core, so it's possible they decided to limited some complexities. It does suggest that it's not the lowest hanging of fruit, though.
>
> > Prefetching and branch prediction will probably improve.
> >
>
> From what I could gather in the TRMs, the prefetching to date seems to be based on observing
> access patterns in the cache (originally, from cache misses). I don't know if they've moved
> beyond this already, but if not they'd benefit from having IP-hashed stream detection.
>
> > And yes, hopefully they will fix their cache design...but I think a lot of that
> > is tied to the PD capabilities of clients (which is to say, not much).
> >
>
> When you say fix it, are you referring to latencies, size, hierarchy arrangement,
> or what? Size and hierarchy arrangement are really pretty much the same as
> everyone else in these segments, unless you count Broadwell-Y.
L1I L1D L2
A15 (Exynos 5250) 32K,2-way 32K,2-way,4-cycles 1M,16-way,21-cycles
A57 48K,3-way 32K,2-way,5-cycles(?) 512K-2M,16-way,??-cycles
Denver 128K,4-way 64K,4-way,?-cycles 2M,16-way,18-cycles
A8 64K,?-way 64K,?-way,4-cycles 1M,??-way,??-cycles + 4M L3
Silvermont 32K,8-way 24K,6-way,3-cycles 1M,16-way,14-cycles
I include the A15 only to see the L2 latency at 1MB. A57 may be a little better, but I'm not sure. Anybody has the numbers?
So of the modern 64-bit mobile CPUs, I would say A57 has the worst cache hierarchy. Unless the L2 is faster than Silvermont's at 1MB (because its L1 is significantly worse). But I doubt it is, I would say the L2 is closer to 20 cycles than to 10.
> David Kanter (dkanter.delete@this.realworldtech.com) on February 4, 2015 11:25 am wrote:
> > Memory disambiguation would be most useful with another load unit.
> >
> > I would do another load unit. I don't think it's very helpful to do 2 ST/clock,
> > especially since it makes your store buffer a lot nastier to deal with.
> >
>
> I don't think it's a coincidence that Core 2 both increased width across the board and added memory
> disambiguation, all while not adding a second store port until two uarch generations later. Although
> I do of course agree that disambiguation would be more effective with a second load port.
>
> I have written lots of code which has a higher than 1:2 load to ALU density for decent sized chunks or loop
> bodies, so I do think a second load unit would help a lot. Even more so if they're adding a third ALU.
Silvermont does not do memory disambiguation, despite Intel having perhaps more experience than anyone with it. Although it is their first gen core, so it's possible they decided to limited some complexities. It does suggest that it's not the lowest hanging of fruit, though.
>
> > Prefetching and branch prediction will probably improve.
> >
>
> From what I could gather in the TRMs, the prefetching to date seems to be based on observing
> access patterns in the cache (originally, from cache misses). I don't know if they've moved
> beyond this already, but if not they'd benefit from having IP-hashed stream detection.
>
> > And yes, hopefully they will fix their cache design...but I think a lot of that
> > is tied to the PD capabilities of clients (which is to say, not much).
> >
>
> When you say fix it, are you referring to latencies, size, hierarchy arrangement,
> or what? Size and hierarchy arrangement are really pretty much the same as
> everyone else in these segments, unless you count Broadwell-Y.
L1I L1D L2
A15 (Exynos 5250) 32K,2-way 32K,2-way,4-cycles 1M,16-way,21-cycles
A57 48K,3-way 32K,2-way,5-cycles(?) 512K-2M,16-way,??-cycles
Denver 128K,4-way 64K,4-way,?-cycles 2M,16-way,18-cycles
A8 64K,?-way 64K,?-way,4-cycles 1M,??-way,??-cycles + 4M L3
Silvermont 32K,8-way 24K,6-way,3-cycles 1M,16-way,14-cycles
I include the A15 only to see the L2 latency at 1MB. A57 may be a little better, but I'm not sure. Anybody has the numbers?
So of the modern 64-bit mobile CPUs, I would say A57 has the worst cache hierarchy. Unless the L2 is faster than Silvermont's at 1MB (because its L1 is significantly worse). But I doubt it is, I would say the L2 is closer to 20 cycles than to 10.
Topic | Posted By | Date |
---|---|---|
ARM announces A72 | Maynard Handley | 2015/02/03 11:36 AM |
ARM announces A72 | anon | 2015/02/03 12:53 PM |
ARM announces A72 | Hugo Décharnes | 2015/02/03 01:20 PM |
ARM announces A72 | juanrga | 2015/02/03 04:15 PM |
ARM announces A72 | Wilco | 2015/02/04 12:58 AM |
ARM announces A72 | Eric Bron | 2015/02/04 01:48 AM |
ARM announces A72 | none | 2015/02/04 02:24 AM |
ARM announces A72 | Eric Bron | 2015/02/04 02:42 AM |
ARM announces A72 | Exophase | 2015/02/04 07:01 AM |
ARM announces A72 | Anon | 2015/02/04 07:35 AM |
ARM announces A72 | Exophase | 2015/02/04 07:58 AM |
ARM announces A72 | Groo | 2015/02/04 09:24 AM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 10:51 AM |
ARM Marketing, BS up to my ears | Maynard Handley | 2015/02/04 01:59 PM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 02:21 PM |
ARM Marketing, BS up to my ears | Groo | 2015/02/04 02:30 PM |
ARM announces A72 | juanrga | 2015/02/04 04:23 AM |
ARM announces A72 | Wilco | 2015/02/04 03:01 PM |
ARM announces A72 | juanrga | 2015/02/04 04:06 PM |
ARM announces A72 | Anon | 2015/02/04 01:28 AM |
ARM announces A72 | juanrga | 2015/02/04 04:31 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 06:49 AM |
ARM announces A72 | Ronald Maas | 2015/02/03 07:23 PM |
ARM announces A72 | Seni | 2015/02/04 12:19 AM |
ARM announces A72 | Maynard Handley | 2015/02/04 10:42 AM |
ARM announces A72 | Seni | 2015/02/04 12:33 PM |
ARM announces A72 | dmcq | 2015/02/04 12:57 PM |
ARM announces A72 | Ronald Maas | 2015/02/04 06:42 PM |
ARM announces A72 | anon | 2015/02/04 05:19 AM |
ARM announces A72 | Exophase | 2015/02/04 07:31 AM |
ARM announces A72 | David Kanter | 2015/02/04 10:25 AM |
ARM announces A72 | Exophase | 2015/02/04 01:33 PM |
ARM announces A72 | anon | 2015/02/04 10:27 PM |
ARM announces A72 (fixed format) | anon | 2015/02/04 10:29 PM |
ARM announces A72 | Exophase | 2015/02/04 11:11 PM |
ARM announces A72 | anon | 2015/02/05 12:02 AM |
ARM announces A72 | anon | 2015/02/04 05:57 PM |
ARM announces A72 | Wilco | 2015/02/03 01:39 PM |
ARM announces A72 | Maynard Handley | 2015/02/03 02:13 PM |
ARM announces A72 | anon | 2015/02/03 02:29 PM |
ARM announces A72 | Wilco | 2015/02/03 02:44 PM |
ARM announces A72 | David Kanter | 2015/02/04 09:56 AM |
ARM announces A72 | Peter Greenhalgh | 2015/02/04 10:56 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 11:59 AM |
ARM announces A72 | Alberto | 2015/02/07 10:22 AM |
ARM announces A72 | Exophase | 2015/02/07 10:47 AM |
ARM announces A72 | Alberto | 2015/02/07 12:44 PM |
ARM announces A72 | Exophase | 2015/02/07 02:35 PM |
ARM announces A72 | Alberto | 2015/02/08 01:09 AM |
ARM announces A72 | Exophase | 2015/02/08 11:05 AM |
ARM announces A72 | David Kanter | 2015/02/08 12:39 AM |
ARM announces A72 | dmcq | 2015/02/08 04:14 AM |
ARM announces A72 | Michael S | 2015/02/08 04:38 AM |
ARM announces A72 | Gabriele Svelto | 2015/02/10 05:11 AM |
ARM announces A72 | Jouni Osmala | 2015/02/10 11:24 AM |
slit vs unified | Michael S | 2015/02/10 01:57 PM |
slit vs unified | dmcq | 2015/02/11 05:44 AM |
ARM announces A72 | Doug S | 2015/02/08 09:00 AM |
ARM announces A72 | Exophase | 2015/02/08 10:57 AM |
ARM announces A72 | dmcq | 2015/02/04 01:10 PM |
ARM announces A72 | David Kanter | 2015/02/04 02:28 PM |
ARM announces A72 | Wilco | 2015/02/04 01:59 PM |
ARM announces A72 | Aaron Spink | 2015/02/04 09:31 PM |
Intel 32nm vs 14 nm | Michael S | 2015/02/05 01:03 AM |
Intel 32nm vs 14 nm | Wilco | 2015/02/05 02:27 AM |
Intel 32nm vs 14 nm | David Kanter | 2015/02/05 09:05 AM |
Intel 32nm vs 14 nm | carop | 2015/02/05 11:12 AM |
Normalize to drawn or effective width? | David Kanter | 2015/02/05 11:45 AM |
Normalize to drawn or effective width? | carop | 2015/02/05 02:40 PM |
Normalize to drawn or effective width? | David Kanter | 2015/02/06 12:44 PM |