By: Alberto (git.delete@this.git.it), October 1, 2015 8:33 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on October 1, 2015 6:27 am wrote:
> Alberto (git.delete@this.git.it) on October 1, 2015 5:41 am wrote:
> > anon (anon.delete@this.anon.com) on October 1, 2015 3:21 am wrote:
> > > Alberto (git.delete@this.git.it) on October 1, 2015 1:13 am wrote:
> > > > Maynard Handley (name99.delete@this.name99.org) on September 30, 2015 4:30 pm wrote:
> > > > > Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > > > > > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> > > > > >
> > > > > > - 5 wide decode
> > > > > > - 6 wide allocation/decoder queue
> > > > > > - 6 wide ROB
> > > > > > - 8 wide issue
> > > > > > - 8 wide retire (4/thread)
> > > > > >
> > > > > > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > > > > > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
> > > > >
> > > > > I agree it's weird, but it doesn't seem to have bought them very much in performance
> > > > > (so maybe that's why they kept it quiet, to avoid unrealistic expectations?)
> > > > >
> > > > > http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9
> > > > >
> > > > > Is it possible that they switched to something like two
> > > > > 3-wide execution clusters, and they're losing whatever
> > > > > they should have gained in cluster communication? But clustering seems a very un-Intel direction...
> > > > >
> > > > > Another possibility is what I suggested when Skylake first came out: that for Skylake Intel deliberately
> > > > > made choices that are sub-optimal for IPC, but allow higher frequency to be sustained for longer. So it's
> > > > > somewhat unfair, say, to compare 3GHz Broadwell with 3GHz Skylake, the real comparison out to be something
> > > > > like "amount of work done per second at equal power for the same sort of level of chip".
> > > > > Those numbers are all over the place:
> > > > > http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/17
> > > > > with 91W Skyake against 88W Broadwell sometimes behind (WinRAR,
> > > > > Sunspider, WebXPRT) sometimes 25% ahead(Octane).
> > > > >
> > > > > The spread in results seems to tell us that
> > > > > - there has been a substantial change in the micro-architecture BUT
> > > > > - that change seems rather "fragile", in that it may be parameterized
> > > > > to maximize a weighted basket of benchmarks,
> > > > > but the changes are no longer unequivocally a good idea
> > > > > for all (or even for 95%) of workloads; we're getting
> > > > > close to the territory of improves "60% of workloads by 3% and harms the other 40% by 2%".
> > > >
> > > > Come on :). Every new arc. has upsides and downsides, differences of +3% or -2% are pretty standard.
> > > > Even the very crap GeekB....!!! shows a solid 8% boost in IPC on Arstechnica over Haswell. I pretty
> > > > believe in the +10% claimed by Intel in a session of SPEC 2006 without libquantum obviously.
> > > > Don't try to find unexistent defects in a cpu, this is not respectful of the cpu team.
> > > >
> > > > Kudos to Intel to have archived this without rising the L2 size to 1MB or 3MB, allowing easy(!)
> > > > and efficent multicore SKUs for server with a smaller footprint and power consumption.
> > >
> > > Intel has broken these benchmarks completely by decreasing the latency of their gigantic 8MB L3 cache!
> > > Spending power on this useless cache is nothing but cheating intended to mislead the consumer.
> > >
> > > /Anti-Alberto
> >
> > Unfortunately for you the L3 latency is untouched and the L2 latency is
>
> L3 latency is effectively lower, according to the numbers shown, therefore Intel is cheating.
>
> > higher of one cycle. So the situation is even worser than in haswell.
> >
> > Nice try.....go to Techreport instead to spread fud
>
> Exactly! Now you know how it feels. Next time hopefully you'll think twice
> before taking it upon yourself to regurgitate your rubbish into ARM threads.
Rubbish is always useful in the golden plated, a little childish and uncritical ARM world, it is a sort of back to the realty thing, a realty in which have a weight the physical limits of materials and equipment.
> Alberto (git.delete@this.git.it) on October 1, 2015 5:41 am wrote:
> > anon (anon.delete@this.anon.com) on October 1, 2015 3:21 am wrote:
> > > Alberto (git.delete@this.git.it) on October 1, 2015 1:13 am wrote:
> > > > Maynard Handley (name99.delete@this.name99.org) on September 30, 2015 4:30 pm wrote:
> > > > > Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > > > > > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> > > > > >
> > > > > > - 5 wide decode
> > > > > > - 6 wide allocation/decoder queue
> > > > > > - 6 wide ROB
> > > > > > - 8 wide issue
> > > > > > - 8 wide retire (4/thread)
> > > > > >
> > > > > > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > > > > > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
> > > > >
> > > > > I agree it's weird, but it doesn't seem to have bought them very much in performance
> > > > > (so maybe that's why they kept it quiet, to avoid unrealistic expectations?)
> > > > >
> > > > > http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/9
> > > > >
> > > > > Is it possible that they switched to something like two
> > > > > 3-wide execution clusters, and they're losing whatever
> > > > > they should have gained in cluster communication? But clustering seems a very un-Intel direction...
> > > > >
> > > > > Another possibility is what I suggested when Skylake first came out: that for Skylake Intel deliberately
> > > > > made choices that are sub-optimal for IPC, but allow higher frequency to be sustained for longer. So it's
> > > > > somewhat unfair, say, to compare 3GHz Broadwell with 3GHz Skylake, the real comparison out to be something
> > > > > like "amount of work done per second at equal power for the same sort of level of chip".
> > > > > Those numbers are all over the place:
> > > > > http://www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation/17
> > > > > with 91W Skyake against 88W Broadwell sometimes behind (WinRAR,
> > > > > Sunspider, WebXPRT) sometimes 25% ahead(Octane).
> > > > >
> > > > > The spread in results seems to tell us that
> > > > > - there has been a substantial change in the micro-architecture BUT
> > > > > - that change seems rather "fragile", in that it may be parameterized
> > > > > to maximize a weighted basket of benchmarks,
> > > > > but the changes are no longer unequivocally a good idea
> > > > > for all (or even for 95%) of workloads; we're getting
> > > > > close to the territory of improves "60% of workloads by 3% and harms the other 40% by 2%".
> > > >
> > > > Come on :). Every new arc. has upsides and downsides, differences of +3% or -2% are pretty standard.
> > > > Even the very crap GeekB....!!! shows a solid 8% boost in IPC on Arstechnica over Haswell. I pretty
> > > > believe in the +10% claimed by Intel in a session of SPEC 2006 without libquantum obviously.
> > > > Don't try to find unexistent defects in a cpu, this is not respectful of the cpu team.
> > > >
> > > > Kudos to Intel to have archived this without rising the L2 size to 1MB or 3MB, allowing easy(!)
> > > > and efficent multicore SKUs for server with a smaller footprint and power consumption.
> > >
> > > Intel has broken these benchmarks completely by decreasing the latency of their gigantic 8MB L3 cache!
> > > Spending power on this useless cache is nothing but cheating intended to mislead the consumer.
> > >
> > > /Anti-Alberto
> >
> > Unfortunately for you the L3 latency is untouched and the L2 latency is
>
> L3 latency is effectively lower, according to the numbers shown, therefore Intel is cheating.
>
> > higher of one cycle. So the situation is even worser than in haswell.
> >
> > Nice try.....go to Techreport instead to spread fud
>
> Exactly! Now you know how it feels. Next time hopefully you'll think twice
> before taking it upon yourself to regurgitate your rubbish into ARM threads.
Rubbish is always useful in the golden plated, a little childish and uncritical ARM world, it is a sort of back to the realty thing, a realty in which have a weight the physical limits of materials and equipment.