By: David Kanter (dkanter.delete@this.realworldtech.com), October 1, 2015 12:54 pm
Room: Moderated Discussions
SHK (no.delete@this.mail.com) on October 1, 2015 12:11 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 10:01 am wrote:
> > Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> > >
> > > - 5 wide decode
> > > - 6 wide allocation/decoder queue
> > > - 6 wide ROB
> > > - 8 wide issue
> > > - 8 wide retire (4/thread)
> > >
> > > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
> >
> > Actually, I think Sandy Bridge and Haswell were more significant.
> >
> > It's nice to have more ALUs, but what really matters are the load/store units. Having 10 ALUs with 1 LD/ST
> > unit is really pointless, except on code with insanely high compute:memory ratios (which isn't most code).
> >
> > For a general purpose CPU, I'd focus on getting the load/store right first, then focus on the ALUs.
> >
> > David
>
> Agreed, i hope that in the skylake-xeon the L1 latency will be lower. At last since snb where're stuck
> with the same size/latency/associtivity, throughput is nice, but IMHO the real salient is latency.
I don't think that L1D will change for SKL-E. I think the L2 cache will change.
> Power8 has twice the L1D, same associtivity and 3-cycles latency, and i think it's
> worth to pay the price in Watt for that (for desktop and servers, of course).
ISTR reading that hte POWER8 (or maybe it was the latest z?) used around 2W in the L1D.
IBM's L1D is super aggressive, but it's also a write-through design, which simplifies many flows (at the cost of worse overall performance).
David
> David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 10:01 am wrote:
> > Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> > >
> > > - 5 wide decode
> > > - 6 wide allocation/decoder queue
> > > - 6 wide ROB
> > > - 8 wide issue
> > > - 8 wide retire (4/thread)
> > >
> > > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
> >
> > Actually, I think Sandy Bridge and Haswell were more significant.
> >
> > It's nice to have more ALUs, but what really matters are the load/store units. Having 10 ALUs with 1 LD/ST
> > unit is really pointless, except on code with insanely high compute:memory ratios (which isn't most code).
> >
> > For a general purpose CPU, I'd focus on getting the load/store right first, then focus on the ALUs.
> >
> > David
>
> Agreed, i hope that in the skylake-xeon the L1 latency will be lower. At last since snb where're stuck
> with the same size/latency/associtivity, throughput is nice, but IMHO the real salient is latency.
I don't think that L1D will change for SKL-E. I think the L2 cache will change.
> Power8 has twice the L1D, same associtivity and 3-cycles latency, and i think it's
> worth to pay the price in Watt for that (for desktop and servers, of course).
ISTR reading that hte POWER8 (or maybe it was the latest z?) used around 2W in the L1D.
IBM's L1D is super aggressive, but it's also a write-through design, which simplifies many flows (at the cost of worse overall performance).
David