By: Per Hesselgren (grabb1948.delete@this.passagen.se), February 1, 2013 3:13 am
Room: Moderated Discussions
Per Hesselgren (grabb1948.delete@this.passagen.se) on February 1, 2013 3:06 am wrote:
> Johan (johan.delete@this.anandtech.com) on February 1, 2013 1:40 am wrote:
> >
> > > The one thing I can see that might change this would be for Intel to come up with
> > > a much better main-memory technology (and I guess die-stacking might be a possibility):
> > > but if everyone is stuck with the same DDR3/DDR4 standard, with the same number
> > > of pins and the same effective-bandwidth-per-pin, then they're all going to be in
> > > the same ballpark for server throughput, regardless of ISA and process-node
> > > differences.
> >
> >
> > Memory bandwidth is the main bottleneck for serverapps throughput?
> > Sorry, but that is a gross oversimplification.
> > In the world of perfect prefetching maybe. But in the real world, prefetching is hardly perfect.
> >
> > Many server apps hardly see any speedup if you use faster memory. The way that the CPU core hides
> > memory latency has a big impact. That is exactly why a complex core like the Xeon E5 is capable
> > of outperforming many other servers with a higher core count and the same or more bandwidth.
> >
> >
> Yes but Xeon E5 has a lot of L3 cache.
> A comparison with the old Opteron (before Bulldozer) is more of interest-
> no L3 only 1024 kB L2. This is what we see in some ARM CPUs today.
>
>
Before Barcelona!
> Johan (johan.delete@this.anandtech.com) on February 1, 2013 1:40 am wrote:
> >
> > > The one thing I can see that might change this would be for Intel to come up with
> > > a much better main-memory technology (and I guess die-stacking might be a possibility):
> > > but if everyone is stuck with the same DDR3/DDR4 standard, with the same number
> > > of pins and the same effective-bandwidth-per-pin, then they're all going to be in
> > > the same ballpark for server throughput, regardless of ISA and process-node
> > > differences.
> >
> >
> > Memory bandwidth is the main bottleneck for serverapps throughput?
> > Sorry, but that is a gross oversimplification.
> > In the world of perfect prefetching maybe. But in the real world, prefetching is hardly perfect.
> >
> > Many server apps hardly see any speedup if you use faster memory. The way that the CPU core hides
> > memory latency has a big impact. That is exactly why a complex core like the Xeon E5 is capable
> > of outperforming many other servers with a higher core count and the same or more bandwidth.
> >
> >
> Yes but Xeon E5 has a lot of L3 cache.
> A comparison with the old Opteron (before Bulldozer) is more of interest-
> no L3 only 1024 kB L2. This is what we see in some ARM CPUs today.
>
>
Before Barcelona!