In our Sandy Bridge-EP and Romley platform review, we look at the performance and power efficiency gains for Intel’s latest server microprocessor on industry standard benchmarks including SPECcpu2006 and SPECpower_ssj2008. The results are impressive, Sandy Bridge-EP is clearly the best x86 server processor on the market, and Romley will be the platform of choice for the next 2 years.
Memory bandwidth is a critical to feeding the shader arrays in programmable GPUs. We show that memory is an integral part of a good performance model and can impact graphics by 40% or more. The implications are important for upcoming integrated graphics, such as AMD’s Llano and Intel’s Ivy Bridge – as the bandwidth constraints will play a key role in determining overall performance.
Modern graphics processors are incredibly complex, but understanding their performance is essential, as they become an increasingly important component of computer systems. In this report, we use a set of benchmark results to build accurate performance models for AMD and Nvidia GPUs. We verify that our model can predict performance within roughly 6-8% for many desktop graphics cards and show how Nvidia’s microarchitecture and drivers achieve roughly 2X higher utilization than AMD’s VLIW5 design.
Sandy Bridge SPECcpu2006 estimates are finally available. The data show per-core performance increased by 30% or more compared to the fastest Westmere design. We analyze the performance numbers for Intel’s newest microarchitecture and estimate gains of 12% for multi-threading on integer workloads. We also show high sensitivity for integer performance to frequency and much more limited response for floating point workloads. Last, we assess the implications for AMD to match Sandy Bridge’s performance for both throughput and single threaded workloads.
Recently, benchmarks for AMD’s eagerly awaited Bulldozer architecture leaked online. So far, this has mostly created uncertainty about the performance of future products, rather than answering questions. We look at the test system and benchmarks and explain the difficulties in precisely estimating performance. We analyze the benchmark results and draw several conclusions about Bulldozer’s microarchitecture and performance.
PhysX is a key application that Nvidia uses to showcase the advantages of GPU computing (GPGPU) for consumers. PhysX executing on an Nvidia GPU an improve performance by 2-4X compared to running on a CPU from Intel or AMD. We investigated and discovered that CPU PhysX exclusively uses x87 rather than the faster SSE instructions. This hobbles the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
In this article, we test out a new HPC benchmark from one of our readers on an Istanbul server from Supermicro. MAQSIP-RT is a forecasting and analysis package that is commonly used throughout the weather and atmospheric chemistry communities. In our first run, we take a look at scalability and performance and find a benchmark that suits many of our needs.
Westmere is a shrink to the 32nm process and has 50% more cores, 50% more last level cache and several other improvements we detailed in our first article. In our second article on Westmere, we take a look at the performance of the Westmere-EP product, targeted at 2-socket servers. We compare the performance of Westmere to the socket compatible prior generation Nehalem microprocessors, using the same server and same frequency parts to see the actual benefits of Westmere.
The computer industry is on the cusp of yet another turn of the Wheel of Reincarnation, with the graphics processor unit (GPU) cast as the heir apparent of the floating point co-processors of days long gone. Modern GPUs are ostensibly higher performance and more power efficient than CPUs for their target workload, and many companies and media outlets claim they are leaving CPUs in the dust. Is this really the case though? This article explores the quantitative basis for these claims, with some surprising results.
Intel’s eagerly Nehalem microarchitecture is a tremendous advance over the previous generation, pushing forward both system integration and core performance. Nehalem includes 4 cores with simultaneous multi-threading, an integrated memory controller, the new CSI (or QPI) coherency links and a redesigned cache hierarchy in a single die. The first 55xx series Xeons, based on Nehalem will come to market shortly, and with that in mind, we take a look at the performance and power efficiency advantages for Nehalem.