By: Maynard Handley (name99.delete@this.name99.org), September 30, 2015 3:30 pm
Room: Moderated Discussions
Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
>
> - 5 wide decode
> - 6 wide allocation/decoder queue
> - 6 wide ROB
> - 8 wide issue
> - 8 wide retire (4/thread)
>
> Though Haswell already added extra two extra issue ports, this the first real increase in width
> since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
I agree it's weird, but it doesn't seem to have bought them very much in performance (so maybe that's why they kept it quiet, to avoid unrealistic expectations?)
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9
Is it possible that they switched to something like two 3-wide execution clusters, and they're losing whatever they should have gained in cluster communication? But clustering seems a very un-Intel direction...
Another possibility is what I suggested when Skylake first came out: that for Skylake Intel deliberately made choices that are sub-optimal for IPC, but allow higher frequency to be sustained for longer. So it's somewhat unfair, say, to compare 3GHz Broadwell with 3GHz Skylake, the real comparison out to be something like "amount of work done per second at equal power for the same sort of level of chip".
Those numbers are all over the place:
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/17
with 91W Skyake against 88W Broadwell sometimes behind (WinRAR, Sunspider, WebXPRT) sometimes 25% ahead(Octane).
The spread in results seems to tell us that
- there has been a substantial change in the micro-architecture BUT
- that change seems rather "fragile", in that it may be parameterized to maximize a weighted basket of benchmarks, but the changes are no longer unequivocally a good idea for all (or even for 95%) of workloads; we're getting close to the territory of improves "60% of workloads by 3% and harms the other 40% by 2%".
> It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
>
> - 5 wide decode
> - 6 wide allocation/decoder queue
> - 6 wide ROB
> - 8 wide issue
> - 8 wide retire (4/thread)
>
> Though Haswell already added extra two extra issue ports, this the first real increase in width
> since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
I agree it's weird, but it doesn't seem to have bought them very much in performance (so maybe that's why they kept it quiet, to avoid unrealistic expectations?)
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9
Is it possible that they switched to something like two 3-wide execution clusters, and they're losing whatever they should have gained in cluster communication? But clustering seems a very un-Intel direction...
Another possibility is what I suggested when Skylake first came out: that for Skylake Intel deliberately made choices that are sub-optimal for IPC, but allow higher frequency to be sustained for longer. So it's somewhat unfair, say, to compare 3GHz Broadwell with 3GHz Skylake, the real comparison out to be something like "amount of work done per second at equal power for the same sort of level of chip".
Those numbers are all over the place:
http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/17
with 91W Skyake against 88W Broadwell sometimes behind (WinRAR, Sunspider, WebXPRT) sometimes 25% ahead(Octane).
The spread in results seems to tell us that
- there has been a substantial change in the micro-architecture BUT
- that change seems rather "fragile", in that it may be parameterized to maximize a weighted basket of benchmarks, but the changes are no longer unequivocally a good idea for all (or even for 95%) of workloads; we're getting close to the territory of improves "60% of workloads by 3% and harms the other 40% by 2%".