By: anon (anon.delete@this.anon.com), February 4, 2015 10:29 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on February 4, 2015 11:27 pm wrote:
> Exophase (exophase.delete@this.gmail.com) on February 4, 2015 2:33 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on February 4, 2015 11:25 am wrote:
> > > Memory disambiguation would be most useful with another load unit.
> > >
> > > I would do another load unit. I don't think it's very helpful to do 2 ST/clock,
> > > especially since it makes your store buffer a lot nastier to deal with.
> > >
> >
> > I don't think it's a coincidence that Core 2 both increased width across the board and added memory
> > disambiguation, all while not adding a second store port until two uarch generations later. Although
> > I do of course agree that disambiguation would be more effective with a second load port.
> >
> > I have written lots of code which has a higher than 1:2 load to ALU density for decent sized chunks or loop
> > bodies, so I do think a second load unit would help a lot. Even more so if they're adding a third ALU.
>
> Silvermont does not do memory disambiguation, despite Intel having perhaps more experience than
> anyone with it. Although it is their first gen core, so it's possible they decided to limited
> some complexities. It does suggest that it's not the lowest hanging of fruit, though.
>
> >
> > > Prefetching and branch prediction will probably improve.
> > >
> >
> > From what I could gather in the TRMs, the prefetching to date seems to be based on observing
> > access patterns in the cache (originally, from cache misses). I don't know if they've moved
> > beyond this already, but if not they'd benefit from having IP-hashed stream detection.
> >
> > > And yes, hopefully they will fix their cache design...but I think a lot of that
> > > is tied to the PD capabilities of clients (which is to say, not much).
> > >
> >
> > When you say fix it, are you referring to latencies, size, hierarchy arrangement,
> > or what? Size and hierarchy arrangement are really pretty much the same as
> > everyone else in these segments, unless you count Broadwell-Y.
Fixed format:
I include the A15 only to see the L2 latency at 1MB. A57 may be a little better, but I'm not sure. Anybody has the numbers?
So of the modern 64-bit mobile CPUs, I would say A57 has the worst cache hierarchy. Unless the L2 is faster than Silvermont's at 1MB (because its L1 is significantly worse). But I doubt it is, I would say the L2 is closer to 20 cycles than to 10.
> Exophase (exophase.delete@this.gmail.com) on February 4, 2015 2:33 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on February 4, 2015 11:25 am wrote:
> > > Memory disambiguation would be most useful with another load unit.
> > >
> > > I would do another load unit. I don't think it's very helpful to do 2 ST/clock,
> > > especially since it makes your store buffer a lot nastier to deal with.
> > >
> >
> > I don't think it's a coincidence that Core 2 both increased width across the board and added memory
> > disambiguation, all while not adding a second store port until two uarch generations later. Although
> > I do of course agree that disambiguation would be more effective with a second load port.
> >
> > I have written lots of code which has a higher than 1:2 load to ALU density for decent sized chunks or loop
> > bodies, so I do think a second load unit would help a lot. Even more so if they're adding a third ALU.
>
> Silvermont does not do memory disambiguation, despite Intel having perhaps more experience than
> anyone with it. Although it is their first gen core, so it's possible they decided to limited
> some complexities. It does suggest that it's not the lowest hanging of fruit, though.
>
> >
> > > Prefetching and branch prediction will probably improve.
> > >
> >
> > From what I could gather in the TRMs, the prefetching to date seems to be based on observing
> > access patterns in the cache (originally, from cache misses). I don't know if they've moved
> > beyond this already, but if not they'd benefit from having IP-hashed stream detection.
> >
> > > And yes, hopefully they will fix their cache design...but I think a lot of that
> > > is tied to the PD capabilities of clients (which is to say, not much).
> > >
> >
> > When you say fix it, are you referring to latencies, size, hierarchy arrangement,
> > or what? Size and hierarchy arrangement are really pretty much the same as
> > everyone else in these segments, unless you count Broadwell-Y.
Fixed format:
L1I L1D L2
A15 (Exynos 5250) 32K,2-way 32K,2-way,4-cycles 1M,16-way,21-cycles
A57 48K,3-way 32K,2-way,5-cycles(?) 512K-2M,16-way,??-cycles
Denver 128K,4-way 64K,4-way,?-cycles 2M,16-way,18-cycles
A8 64K,?-way 64K,?-way,4-cycles 1M,??-way,??-cycles + 4M L3
Silvermont 32K,8-way 24K,6-way,3-cycles 1M,16-way,14-cycles
I include the A15 only to see the L2 latency at 1MB. A57 may be a little better, but I'm not sure. Anybody has the numbers?
So of the modern 64-bit mobile CPUs, I would say A57 has the worst cache hierarchy. Unless the L2 is faster than Silvermont's at 1MB (because its L1 is significantly worse). But I doubt it is, I would say the L2 is closer to 20 cycles than to 10.
Topic | Posted By | Date |
---|---|---|
ARM announces A72 | Maynard Handley | 2015/02/03 11:36 AM |
ARM announces A72 | anon | 2015/02/03 12:53 PM |
ARM announces A72 | Hugo Décharnes | 2015/02/03 01:20 PM |
ARM announces A72 | juanrga | 2015/02/03 04:15 PM |
ARM announces A72 | Wilco | 2015/02/04 12:58 AM |
ARM announces A72 | Eric Bron | 2015/02/04 01:48 AM |
ARM announces A72 | none | 2015/02/04 02:24 AM |
ARM announces A72 | Eric Bron | 2015/02/04 02:42 AM |
ARM announces A72 | Exophase | 2015/02/04 07:01 AM |
ARM announces A72 | Anon | 2015/02/04 07:35 AM |
ARM announces A72 | Exophase | 2015/02/04 07:58 AM |
ARM announces A72 | Groo | 2015/02/04 09:24 AM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 10:51 AM |
ARM Marketing, BS up to my ears | Maynard Handley | 2015/02/04 01:59 PM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 02:21 PM |
ARM Marketing, BS up to my ears | Groo | 2015/02/04 02:30 PM |
ARM announces A72 | juanrga | 2015/02/04 04:23 AM |
ARM announces A72 | Wilco | 2015/02/04 03:01 PM |
ARM announces A72 | juanrga | 2015/02/04 04:06 PM |
ARM announces A72 | Anon | 2015/02/04 01:28 AM |
ARM announces A72 | juanrga | 2015/02/04 04:31 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 06:49 AM |
ARM announces A72 | Ronald Maas | 2015/02/03 07:23 PM |
ARM announces A72 | Seni | 2015/02/04 12:19 AM |
ARM announces A72 | Maynard Handley | 2015/02/04 10:42 AM |
ARM announces A72 | Seni | 2015/02/04 12:33 PM |
ARM announces A72 | dmcq | 2015/02/04 12:57 PM |
ARM announces A72 | Ronald Maas | 2015/02/04 06:42 PM |
ARM announces A72 | anon | 2015/02/04 05:19 AM |
ARM announces A72 | Exophase | 2015/02/04 07:31 AM |
ARM announces A72 | David Kanter | 2015/02/04 10:25 AM |
ARM announces A72 | Exophase | 2015/02/04 01:33 PM |
ARM announces A72 | anon | 2015/02/04 10:27 PM |
ARM announces A72 (fixed format) | anon | 2015/02/04 10:29 PM |
ARM announces A72 | Exophase | 2015/02/04 11:11 PM |
ARM announces A72 | anon | 2015/02/05 12:02 AM |
ARM announces A72 | anon | 2015/02/04 05:57 PM |
ARM announces A72 | Wilco | 2015/02/03 01:39 PM |
ARM announces A72 | Maynard Handley | 2015/02/03 02:13 PM |
ARM announces A72 | anon | 2015/02/03 02:29 PM |
ARM announces A72 | Wilco | 2015/02/03 02:44 PM |
ARM announces A72 | David Kanter | 2015/02/04 09:56 AM |
ARM announces A72 | Peter Greenhalgh | 2015/02/04 10:56 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 11:59 AM |
ARM announces A72 | Alberto | 2015/02/07 10:22 AM |
ARM announces A72 | Exophase | 2015/02/07 10:47 AM |
ARM announces A72 | Alberto | 2015/02/07 12:44 PM |
ARM announces A72 | Exophase | 2015/02/07 02:35 PM |
ARM announces A72 | Alberto | 2015/02/08 01:09 AM |
ARM announces A72 | Exophase | 2015/02/08 11:05 AM |
ARM announces A72 | David Kanter | 2015/02/08 12:39 AM |
ARM announces A72 | dmcq | 2015/02/08 04:14 AM |
ARM announces A72 | Michael S | 2015/02/08 04:38 AM |
ARM announces A72 | Gabriele Svelto | 2015/02/10 05:11 AM |
ARM announces A72 | Jouni Osmala | 2015/02/10 11:24 AM |
slit vs unified | Michael S | 2015/02/10 01:57 PM |
slit vs unified | dmcq | 2015/02/11 05:44 AM |
ARM announces A72 | Doug S | 2015/02/08 09:00 AM |
ARM announces A72 | Exophase | 2015/02/08 10:57 AM |
ARM announces A72 | dmcq | 2015/02/04 01:10 PM |
ARM announces A72 | David Kanter | 2015/02/04 02:28 PM |
ARM announces A72 | Wilco | 2015/02/04 01:59 PM |
ARM announces A72 | Aaron Spink | 2015/02/04 09:31 PM |
Intel 32nm vs 14 nm | Michael S | 2015/02/05 01:03 AM |
Intel 32nm vs 14 nm | Wilco | 2015/02/05 02:27 AM |
Intel 32nm vs 14 nm | David Kanter | 2015/02/05 09:05 AM |
Intel 32nm vs 14 nm | carop | 2015/02/05 11:12 AM |
Normalize to drawn or effective width? | David Kanter | 2015/02/05 11:45 AM |
Normalize to drawn or effective width? | carop | 2015/02/05 02:40 PM |
Normalize to drawn or effective width? | David Kanter | 2015/02/06 12:44 PM |