By: Peter Lewis (peter.delete@this.notyahoo.com), June 5, 2022 4:20 pm
Room: Moderated Discussions
> in general, any accelerator which requires an API is going to be substantially less useful than one which can be accessed through the ISA
A huge advantage of an API for something like neural inference is that when more hardware is added in future generations, code written for a previous generation will automatically make use of it.
It will be interesting to see the measured power consumption difference for a dedicated hardware block like Apple’s Neural Engine compared to neural inference instructions added to the CPU core like Intel’s Advanced Matrix Extensions (AMX). A dedicated hardware block has to be more power efficient but I don’t know by how much.
The Gaussian and Neural Accelerator 2.0 (GNA 2.0) that you mentioned on Tiger Lake is intended for low-power rather than high performance. It only provides 38 Gop/s. For comparison, the Neural Engine in Apple’s M1 Ultra provides 22 Top/s and Nvidia’s H100 PCIe provides 3200 Top/s when 50% of weights are zero (2:1 sparsity).
Intel’s GNA 2.0 in Tiger Lake uses 38mW. If this is scaled up to the H100 PCIe performance level, it would use 3200W, which is almost 10x the power of the H100 PCIe. If people are ever going to have something like Nvidia Maxine or Nvidia Riva running on local hardware, instead of the cloud, they will need an enormous amount of power-efficient neural inference performance.
youtube.com/watch?v=3GPNsPMqY8o
anandtech.com/show/15971/intels-11th-gen-core-tiger-lake-soc-detailed-superfin-willow-cove-and-xelp/5
A huge advantage of an API for something like neural inference is that when more hardware is added in future generations, code written for a previous generation will automatically make use of it.
It will be interesting to see the measured power consumption difference for a dedicated hardware block like Apple’s Neural Engine compared to neural inference instructions added to the CPU core like Intel’s Advanced Matrix Extensions (AMX). A dedicated hardware block has to be more power efficient but I don’t know by how much.
The Gaussian and Neural Accelerator 2.0 (GNA 2.0) that you mentioned on Tiger Lake is intended for low-power rather than high performance. It only provides 38 Gop/s. For comparison, the Neural Engine in Apple’s M1 Ultra provides 22 Top/s and Nvidia’s H100 PCIe provides 3200 Top/s when 50% of weights are zero (2:1 sparsity).
Intel’s GNA 2.0 in Tiger Lake uses 38mW. If this is scaled up to the H100 PCIe performance level, it would use 3200W, which is almost 10x the power of the H100 PCIe. If people are ever going to have something like Nvidia Maxine or Nvidia Riva running on local hardware, instead of the cloud, they will need an enormous amount of power-efficient neural inference performance.
youtube.com/watch?v=3GPNsPMqY8o
anandtech.com/show/15971/intels-11th-gen-core-tiger-lake-soc-detailed-superfin-willow-cove-and-xelp/5