By: Mark Roulo (nothanks.delete@this.xxx.com),
Room: Moderated Discussions
Mark Heath (none.delete@this.none.none) on June 4, 2024 6:32 am wrote:
> none (none.delete@this.none.com) on June 3, 2024 6:22 am wrote:
> > Mark Heath (none.delete@this.none.none) on June 3, 2024 6:16 am wrote:
> > [...]
> > > Why wouldn't HPC use this GPU instead of SME, especially since
> > > the GPU has more DRAM/SLC bandwidth than a P-core cluster?
> >
> > That wouldn't work for the HPC workloads that need FP64. That's why high-end GPU for HPC have
> > much more FP64 performance than high-end consumer GPU.
>
> You're right that some calculations need FP64. One example I know about is the final iterations
> of the self-consistent field calculation in quantum chemistry. The initial iterations
> can be done in FP32 but the last few iterations need to be in FP64. Admittedly, quantum
> chemistry calculations are not something a lot of Apple customers would do.
>
> I don't know how much faith to have in a website named CPU Monkey, but if this website is correct,
> Apple's M3 Max GPU has 3.55 TFLOPs of FP64. Two P-clusters of SME/AMX provide 1 TFLOPs of FP64.
> It therefore appears that Apple's GPU has significantly more FP64 performance than the SME/AMX
> units. The ratio of FP32 to FP64 performance in both the GPU and SME/AMX units is 4. This suggests
> Apple is using four FP32 multipliers to make one FP64 multiplier.
>
> https://www.cpu-monkey.com/en/igpu-apple_m3_max_40_core
>
Googling suggests that the Apple GPUs don't have fp64 hardware support at all. I see at least one github project to provide fp64 emulation.
I can't find any definitive answer from an Apple site. There might be an answer somewhere here: https://rosenzweig.io/
> none (none.delete@this.none.com) on June 3, 2024 6:22 am wrote:
> > Mark Heath (none.delete@this.none.none) on June 3, 2024 6:16 am wrote:
> > [...]
> > > Why wouldn't HPC use this GPU instead of SME, especially since
> > > the GPU has more DRAM/SLC bandwidth than a P-core cluster?
> >
> > That wouldn't work for the HPC workloads that need FP64. That's why high-end GPU for HPC have
> > much more FP64 performance than high-end consumer GPU.
>
> You're right that some calculations need FP64. One example I know about is the final iterations
> of the self-consistent field calculation in quantum chemistry. The initial iterations
> can be done in FP32 but the last few iterations need to be in FP64. Admittedly, quantum
> chemistry calculations are not something a lot of Apple customers would do.
>
> I don't know how much faith to have in a website named CPU Monkey, but if this website is correct,
> Apple's M3 Max GPU has 3.55 TFLOPs of FP64. Two P-clusters of SME/AMX provide 1 TFLOPs of FP64.
> It therefore appears that Apple's GPU has significantly more FP64 performance than the SME/AMX
> units. The ratio of FP32 to FP64 performance in both the GPU and SME/AMX units is 4. This suggests
> Apple is using four FP32 multipliers to make one FP64 multiplier.
>
> https://www.cpu-monkey.com/en/igpu-apple_m3_max_40_core
>
Googling suggests that the Apple GPUs don't have fp64 hardware support at all. I see at least one github project to provide fp64 emulation.
I can't find any definitive answer from an Apple site. There might be an answer somewhere here: https://rosenzweig.io/


