By: SHK (no.delete@this.mail.com), October 1, 2015 11:11 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 10:01 am wrote:
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> >
> > - 5 wide decode
> > - 6 wide allocation/decoder queue
> > - 6 wide ROB
> > - 8 wide issue
> > - 8 wide retire (4/thread)
> >
> > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
>
> Actually, I think Sandy Bridge and Haswell were more significant.
>
> It's nice to have more ALUs, but what really matters are the load/store units. Having 10 ALUs with 1 LD/ST
> unit is really pointless, except on code with insanely high compute:memory ratios (which isn't most code).
>
> For a general purpose CPU, I'd focus on getting the load/store right first, then focus on the ALUs.
>
> David
Agreed, i hope that in the skylake-xeon the L1 latency will be lower. At last since snb where're stuck with the same size/latency/associtivity, throughput is nice, but IMHO the real salient is latency.
Power8 has twice the L1D, same associtivity and 3-cycles latency, and i think it's worth to pay the price in Watt for that (for desktop and servers, of course).
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> >
> > - 5 wide decode
> > - 6 wide allocation/decoder queue
> > - 6 wide ROB
> > - 8 wide issue
> > - 8 wide retire (4/thread)
> >
> > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
>
> Actually, I think Sandy Bridge and Haswell were more significant.
>
> It's nice to have more ALUs, but what really matters are the load/store units. Having 10 ALUs with 1 LD/ST
> unit is really pointless, except on code with insanely high compute:memory ratios (which isn't most code).
>
> For a general purpose CPU, I'd focus on getting the load/store right first, then focus on the ALUs.
>
> David
Agreed, i hope that in the skylake-xeon the L1 latency will be lower. At last since snb where're stuck with the same size/latency/associtivity, throughput is nice, but IMHO the real salient is latency.
Power8 has twice the L1D, same associtivity and 3-cycles latency, and i think it's worth to pay the price in Watt for that (for desktop and servers, of course).