By: Per Hesselgren (grabb1948.delete@this.passagen.se), February 1, 2013 3:06 am
Room: Moderated Discussions
Johan (johan.delete@this.anandtech.com) on February 1, 2013 1:40 am wrote:
>
> > The one thing I can see that might change this would be for Intel to come up with
> > a much better main-memory technology (and I guess die-stacking might be a possibility):
> > but if everyone is stuck with the same DDR3/DDR4 standard, with the same number
> > of pins and the same effective-bandwidth-per-pin, then they're all going to be in
> > the same ballpark for server throughput, regardless of ISA and process-node
> > differences.
>
>
> Memory bandwidth is the main bottleneck for serverapps throughput? Sorry, but that is a gross oversimplification.
> In the world of perfect prefetching maybe. But in the real world, prefetching is hardly perfect.
>
> Many server apps hardly see any speedup if you use faster memory. The way that the CPU core hides
> memory latency has a big impact. That is exactly why a complex core like the Xeon E5 is capable
> of outperforming many other servers with a higher core count and the same or more bandwidth.
>
>
Yes but Xeon E5 has a lot of L3 cache.
A comparison with the old Opteron (before Bulldozer) is more of interest- no L3 only 1024 kB L2. This is what we see in some ARM CPUs today.
>
> > The one thing I can see that might change this would be for Intel to come up with
> > a much better main-memory technology (and I guess die-stacking might be a possibility):
> > but if everyone is stuck with the same DDR3/DDR4 standard, with the same number
> > of pins and the same effective-bandwidth-per-pin, then they're all going to be in
> > the same ballpark for server throughput, regardless of ISA and process-node
> > differences.
>
>
> Memory bandwidth is the main bottleneck for serverapps throughput? Sorry, but that is a gross oversimplification.
> In the world of perfect prefetching maybe. But in the real world, prefetching is hardly perfect.
>
> Many server apps hardly see any speedup if you use faster memory. The way that the CPU core hides
> memory latency has a big impact. That is exactly why a complex core like the Xeon E5 is capable
> of outperforming many other servers with a higher core count and the same or more bandwidth.
>
>
Yes but Xeon E5 has a lot of L3 cache.
A comparison with the old Opteron (before Bulldozer) is more of interest- no L3 only 1024 kB L2. This is what we see in some ARM CPUs today.