High Bandwidth Memory on Xeon Sapphire Rapids

By: John Clarke (noemail.delete@this.gmail.com), July 3, 2021 7:01 pm
Room: Moderated Discussions
At the 2021 International Supercomputing Conference this week, Intel announced that some versions of the Xeon Sapphire Rapids processor will have High Bandwidth Memory (HBM). This HBM will be able to be used in 3 different modes:

HBM-only (“allowing one to save all the cost of the DDR throughout the cluster”)

flat (“for direct software control of what is placed in memory”)

cache (“to simply accelerate existing workloads with no changes to software needed”).

The HBM-only mode is for power savings and performance, not cost savings. HBM has lower power per bandwidth but higher cost per byte than DDR DRAM.

Unless most of the execution time is spent inside Intel’s Math Kernel Library, I doubt no software changes will be needed to accelerate existing workloads when HBM is in cache mode. Since HBM is only on some versions, Intel has the challenge of getting software vendors to optimize their software for a feature that is only on some parts. One way Intel could encourage software support for HBM is by selling a part with 14 cores (or less), 1 GByte of HBM and 2 DDR5 DRAM channels.

Intel’s market cap is less than half of Nvidia’s market cap ($229B vs $511B) even though Intel has more than 3x the earnings of Nvidia ($18B vs $5B). Understandably, Intel wants to narrow the performance gap between Xeon and Nvidia’s A100. HBM allows them to do that.

Intel also announced that Sapphire Rapids is delayed. They said “we now expect Sapphire Rapids to be in production in the first quarter of 2022, with ramp beginning in the second quarter of 2022”. No mention was made of the HBM version being a different date but some websites have said the HBM version will launch in late 2022.

AMD’s server/workstation processors contain a silicon interposer with a separate DRAM controller die. I think it is safe to assume AMD will also add on-package DRAM to their server/workstation processors.

The merits of on-package L4 cache was debated on this website 10 years ago:


The Intel speaker described the CXL 1.1 feature in Sapphire Rapids as “setting the stage for future coherent GPU attach”. That sounds like weasel words for “the first version doesn’t work” but we’ll see.

How much of a performance/price and performance/power difference do people here think there needs to be between a GPU and a Xeon processor for software vendors to continue writing GPU-optimized software? For example, if a GPU has only 2x the performance/price ratio of a Xeon, do you think software vendors would continue writing GPU-optimized software? (In some cases, such as neural net training and software that spends most of its execution time inside Intel’s Math Kernel Library, software developers usually don’t have to be concerned about the underlying hardware. I’m not asking about that simpler situation.)
 Next Post in Thread >
TopicPosted ByDate
High Bandwidth Memory on Xeon Sapphire RapidsJohn Clarke2021/07/03 07:01 PM
  High Bandwidth Memory on Xeon Sapphire RapidsRayla2021/07/03 08:34 PM
  High Bandwidth Memory on Xeon Sapphire Rapidsme2021/07/04 06:56 AM
    High Bandwidth Memory on Xeon Sapphire RapidsCarlie Coats2021/07/04 07:21 AM
  High Bandwidth Memory on Xeon Sapphire RapidsMark Roulo2021/07/04 08:38 AM
Reply to this Topic
Body: No Text
How do you spell avocado?