By: David Kanter (dkanter.delete@this.realworldtech.com), July 13, 2015 8:31 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on July 13, 2015 1:51 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on July 12, 2015 10:34 pm wrote:
> >
> > If you look at IBM's zSeries, it's actually quite different than x86, due to IBM's emphasis on reliability.
> > x86 has write-back L1 caches where the reliability is derived from the robust memory cells (8T design).
>
> Is not 8T exclusively for Atoms?
Intel has been using 8T memory cells for every L1D cache since Nehalem. Maybe since Penryn.
> > IBM zArch uses write-through caching for *all* SRAM-based caches (on some
> > designs that meant L1, L2, and L3 were all write-through, and only L4 was
> > write-back; on more recent ones, I think it's L1 & L2 are write-through).
> >
>
> Yes, that's correct. Z13 has write-through (store-through,
> in IBM speak) L1D and L2D caches, same as z196 and z296.
I think the L3 cache was store-through on one of the CPUs I wrote about.
> > This creates a huge amount of pressure on the L2 and L3 caches to handle the full store bandwidth of
> > the machine. Look at bulldozer for an example of what happens when that doesn't quite work out.
> >
>
> Buldozer is sort of bearable. On Pentium4 it is, at times, really suffocating, partly because of small
> store queue and partly because hazards, imposed by store queue running out of space, are expensive.
Bulldozer has the problem that the L2 cache can only sink one transaction per clock, and you have two cores generating loads and stores.
They would have been vastly better off with a banked design that could satisfy two requests per clock.
> > > In the case of ARM, I would say there is zero chance they did not re-examine the memory ordering model when
> > > defining the 64-bit ISA, with *data* (at least from simulations)
> > > rather than wives-tales, and they would have
> > > taken input from Apple and their own designers (AFAIK Cortex cores do not do a lot of reordering anyway).
> >
> > Based on my conservations with designers, the ARM ordering model is advantageous for
> > simpler cores...but for anything A15+ it's basically not an advantage over x86.
> >
>
> I'd think that the same applies to A5/A7/A53 although for different reasons.
> Which leaves us, on ARMv8 side, with the future successor
> to A17 which is not even announced yet (or is it?).
I remember talking to Mike Fillipo (sp?), who is chief architect on the big cores about the memory ordering, and he said he thought it was basically a wash between ARM and x86 (he was previously at Intel), in terms of both design and validation.
David
> David Kanter (dkanter.delete@this.realworldtech.com) on July 12, 2015 10:34 pm wrote:
> >
> > If you look at IBM's zSeries, it's actually quite different than x86, due to IBM's emphasis on reliability.
> > x86 has write-back L1 caches where the reliability is derived from the robust memory cells (8T design).
>
> Is not 8T exclusively for Atoms?
Intel has been using 8T memory cells for every L1D cache since Nehalem. Maybe since Penryn.
> > IBM zArch uses write-through caching for *all* SRAM-based caches (on some
> > designs that meant L1, L2, and L3 were all write-through, and only L4 was
> > write-back; on more recent ones, I think it's L1 & L2 are write-through).
> >
>
> Yes, that's correct. Z13 has write-through (store-through,
> in IBM speak) L1D and L2D caches, same as z196 and z296.
I think the L3 cache was store-through on one of the CPUs I wrote about.
> > This creates a huge amount of pressure on the L2 and L3 caches to handle the full store bandwidth of
> > the machine. Look at bulldozer for an example of what happens when that doesn't quite work out.
> >
>
> Buldozer is sort of bearable. On Pentium4 it is, at times, really suffocating, partly because of small
> store queue and partly because hazards, imposed by store queue running out of space, are expensive.
Bulldozer has the problem that the L2 cache can only sink one transaction per clock, and you have two cores generating loads and stores.
They would have been vastly better off with a banked design that could satisfy two requests per clock.
> > > In the case of ARM, I would say there is zero chance they did not re-examine the memory ordering model when
> > > defining the 64-bit ISA, with *data* (at least from simulations)
> > > rather than wives-tales, and they would have
> > > taken input from Apple and their own designers (AFAIK Cortex cores do not do a lot of reordering anyway).
> >
> > Based on my conservations with designers, the ARM ordering model is advantageous for
> > simpler cores...but for anything A15+ it's basically not an advantage over x86.
> >
>
> I'd think that the same applies to A5/A7/A53 although for different reasons.
> Which leaves us, on ARMv8 side, with the future successor
> to A17 which is not even announced yet (or is it?).
I remember talking to Mike Fillipo (sp?), who is chief architect on the big cores about the memory ordering, and he said he thought it was basically a wash between ARM and x86 (he was previously at Intel), in terms of both design and validation.
David