Real World Technologies - Forums - Thread: All aboard the SME AI hype train

By: Mark Roulo (nothanks.delete@this.xxx.com), 2024-06-04 14:41 UTC

Mark Heath (none.delete@this.none.none) on June 4, 2024 6:32 am wrote:
> none (none.delete@this.none.com) on June 3, 2024 6:22 am wrote:
> > Mark Heath (none.delete@this.none.none) on June 3, 2024 6:16 am wrote:
> > [...]
> > > Why wouldn't HPC use this GPU instead of SME, especially since
> > > the GPU has more DRAM/SLC bandwidth than a P-core cluster?
> >
> > That wouldn't work for the HPC workloads that need FP64. That's why high-end GPU for HPC have
> > much more FP64 performance than high-end consumer GPU.
>
> You're right that some calculations need FP64. One example I know about is the final iterations
> of the self-consistent field calculation in quantum chemistry. The initial iterations
> can be done in FP32 but the last few iterations need to be in FP64. Admittedly, quantum
> chemistry calculations are not something a lot of Apple customers would do.
>
> I don't know how much faith to have in a website named CPU Monkey, but if this website is correct,
> Apple's M3 Max GPU has 3.55 TFLOPs of FP64. Two P-clusters of SME/AMX provide 1 TFLOPs of FP64.
> It therefore appears that Apple's GPU has significantly more FP64 performance than the SME/AMX
> units. The ratio of FP32 to FP64 performance in both the GPU and SME/AMX units is 4. This suggests
> Apple is using four FP32 multipliers to make one FP64 multiplier.
>
> https://www.cpu-monkey.com/en/igpu-apple_m3_max_40_core
>

Googling suggests that the Apple GPUs don't have fp64 hardware support at all. I see at least one github project to provide fp64 emulation.

I can't find any definitive answer from an Apple site. There might be an answer somewhere here: https://rosenzweig.io/

< Previous Post in Thread

Next Post in Thread >

Thread (107 posts)

Topic	Posted By	Posted
All aboard the SME AI hype train	---	2024-05-23 22:56 UTC
All aboard the SME AI hype train	Rayla	2024-05-24 15:01 UTC
How are people not sick of hearing about AI already [nt]	me	2024-05-24 18:15 UTC
All aboard the SME AI hype train	dmcq	2024-05-24 21:37 UTC
All aboard the SME AI hype train	Linus Torvalds	2024-05-25 21:07 UTC
Coprocessors for matrix math	Mark Heath	2024-05-25 22:39 UTC
Coprocessors for matrix math	Björn Ragnar Björnsson	2024-05-26 03:12 UTC
Coprocessors for matrix math	anon2	2024-05-26 08:11 UTC
Coprocessors for matrix math	Mark Heath	2024-05-26 20:58 UTC
Coprocessors for matrix math	anon	2024-05-26 22:11 UTC
Coprocessors for matrix math	Mark Heath	2024-05-27 01:16 UTC
Coprocessors for matrix math	Linus Torvalds	2024-05-26 06:09 UTC
Coprocessors for matrix math	Mark Heath	2024-05-26 07:01 UTC
Coprocessors for matrix math	Mark Heath	2024-05-26 13:25 UTC
All aboard the SME AI hype train	Björn Ragnar Björnsson	2024-05-26 03:27 UTC
All aboard the SME AI hype train	Eric Fink	2024-05-26 07:34 UTC
All aboard the SME AI hype train	Michael S	2024-05-26 10:31 UTC
All aboard the SME AI hype train	Linus Torvalds	2024-05-26 16:40 UTC
All aboard the SME AI hype train	Freddie	2024-05-26 17:53 UTC
All aboard the SME AI hype train	Robert Wessel	2024-05-26 19:37 UTC
All aboard the SME AI hype train	Mark Heath	2024-05-26 20:14 UTC
All aboard the SME AI hype train	Freddie	2024-05-27 00:22 UTC
All aboard the SME AI hype train	Ungo	2024-05-26 20:23 UTC
All aboard the SME AI hype train	anon	2024-05-26 22:25 UTC
All aboard the SME AI hype train	Eric Fink	2024-05-27 11:24 UTC
All aboard the SME AI hype train	Mark Heath	2024-05-27 12:39 UTC
All aboard the SME AI hype train	Michael S	2024-05-27 16:53 UTC
All aboard the SME AI hype train	Doug S	2024-05-27 21:40 UTC
All aboard the SME AI hype train	Björn Ragnar Björnsson	2024-05-28 02:52 UTC
All aboard the SME AI hype train	Michael S	2024-05-28 11:59 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Roulo	2024-05-29 00:23 UTC
My understanding of acceloperator taxonomy (for this discussion)	Robert Wessel	2024-05-29 01:50 UTC
My understanding of acceloperator taxonomy (for this discussion)	zArchJon	2024-06-06 13:28 UTC
My understanding of acceloperator taxonomy (for this discussion)	Doug S	2024-05-29 05:27 UTC
My understanding of acceloperator taxonomy (for this discussion)	Eric Fink	2024-05-29 05:35 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-05-29 10:05 UTC
My understanding of acceloperator taxonomy (for this discussion)	⚛	2024-05-29 19:35 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Heath	2024-05-30 10:40 UTC
My understanding of acceloperator taxonomy (for this discussion)	dmcq	2024-05-30 13:14 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-05-30 19:36 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Heath	2024-05-31 12:22 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-05-31 12:41 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Heath	2024-05-31 14:23 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-05-31 15:04 UTC
My understanding of acceloperator taxonomy (for this discussion)	Freddie	2024-05-31 15:40 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-06-01 19:24 UTC
Applications that use a huge number of threads	Mark Heath	2024-06-01 02:06 UTC
Applications that use a huge number of threads	Simon Farnsworth	2024-06-01 19:28 UTC
Applications that use a huge number of threads	blaine	2024-06-01 19:42 UTC
Applications that use a huge number of threads	Mark Heath	2024-06-02 08:56 UTC
Applications that use a huge number of threads	blaine	2024-06-02 18:38 UTC
Applications that use a huge number of threads	Doug S	2024-06-02 20:13 UTC
Applications that use a huge number of threads	Mark Heath	2024-06-02 22:30 UTC
Applications that use a huge number of threads	---	2024-06-03 00:01 UTC
Applications that use a huge number of threads	Joern Engel	2024-06-03 02:47 UTC
Applications that use a huge number of threads	Doug S	2024-06-03 05:53 UTC
Applications that use a huge number of threads	none	2024-06-03 07:47 UTC
Applications that use a huge number of threads	Etienne	2024-06-03 09:30 UTC
Applications that use a huge number of threads	Joern Engel	2024-06-03 16:42 UTC
My understanding of acceloperator taxonomy (for this discussion)	Etienne	2024-05-31 15:18 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Heath	2024-06-01 02:11 UTC
My understanding of acceloperator taxonomy (for this discussion)	Doug S	2024-06-02 20:27 UTC
My understanding of acceloperator taxonomy (for this discussion)	Mark Heath	2024-06-02 23:09 UTC
My understanding of acceloperator taxonomy (for this discussion)	---	2024-06-02 23:52 UTC
Scalable Matrix Extension (SME)	Mark Heath	2024-06-03 13:16 UTC
Scalable Matrix Extension (SME)	none	2024-06-03 13:22 UTC
FP64 in Apple GPU	Mark Heath	2024-06-04 13:32 UTC
FP64 in Apple GPU	Mark Roulo	2024-06-04 14:41 UTC
FP64 in Apple GPU	Mark Heath	2024-06-05 00:12 UTC
FP64 in Apple GPU	Mark Roulo	2024-06-05 15:08 UTC
FP64 in Apple GPU	noko	2024-06-05 04:22 UTC
Scalable Matrix Extension (SME)	---	2024-06-03 16:11 UTC
Swift for programming Apple's GPUs	Mark Heath	2024-06-04 13:39 UTC
Swift for programming Apple's GPUs	---	2024-06-04 16:19 UTC
Swift for programming Apple's GPUs	Doug S	2024-06-05 06:05 UTC
Swift for programming Apple's GPUs	---	2024-06-05 16:16 UTC
Scalable Matrix Extension (SME)	Eric Fink	2024-06-03 17:11 UTC
Scalable Matrix Extension (SME)	Mark Heath	2024-06-04 13:46 UTC
Scalable Matrix Extension (SME)	Freddie	2024-06-03 17:22 UTC
Cache blocking GPU kernels	Mark Heath	2024-06-04 13:24 UTC
Cache blocking GPU kernels	Freddie	2024-06-04 16:34 UTC
Tiny size of floating point arithmetic units	Mark Heath	2024-06-04 14:03 UTC
My understanding of acceloperator taxonomy (for this discussion)	Freddie	2024-05-31 13:17 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-05-31 13:28 UTC
My understanding of acceloperator taxonomy (for this discussion)	dmcq	2024-06-01 13:36 UTC
My understanding of acceloperator taxonomy (for this discussion)	Ben LaHaise	2024-06-02 01:45 UTC
My understanding of acceloperator taxonomy (for this discussion)	dmcq	2024-06-02 17:57 UTC
My understanding of acceloperator taxonomy (for this discussion)	Simon Farnsworth	2024-06-03 09:55 UTC
My understanding of acceloperator taxonomy (for this discussion)	Michael S	2024-06-03 10:32 UTC
My understanding of acceloperator taxonomy (for this discussion)	Robert Wessel	2024-06-03 15:55 UTC
My understanding of acceloperator taxonomy (for this discussion)	dmcq	2024-06-04 10:20 UTC
MY66000?	Paul A. Clayton	2024-06-04 12:02 UTC
My understanding of acceloperator taxonomy (for this discussion)	Marcus	2024-06-04 21:06 UTC
My understanding of acceloperator taxonomy (for this discussion)	anon2	2024-06-05 02:04 UTC
My understanding of acceloperator taxonomy (for this discussion)	Robert Wessel	2024-06-05 19:09 UTC
My understanding of acceloperator taxonomy (for this discussion)	Doug S	2024-05-30 20:22 UTC
All aboard the SME AI hype train	Björn Ragnar Björnsson	2024-05-29 01:19 UTC
All aboard the SME AI hype train	dmcq	2024-05-26 10:18 UTC
All aboard the SME AI hype train	dmcq	2024-05-26 19:25 UTC
All aboard the SME AI hype train	Doug S	2024-05-27 05:34 UTC
All aboard the SME AI hype train	dmcq	2024-05-27 21:24 UTC
All aboard the SME AI hype train	dmcq	2024-05-27 22:28 UTC
All aboard the SME AI hype train	Konrad Schwarz	2024-05-29 09:14 UTC
All aboard the SME AI hype train	Linus Torvalds	2024-05-29 17:32 UTC
All aboard the SME AI hype train	dmcq	2024-05-29 20:15 UTC
TI C7x Matrix Multiplication Accelerator	Marcus	2024-06-04 20:53 UTC
TI C7x Matrix Multiplication Accelerator	---	2024-06-05 16:21 UTC

Reply to this Topic
Name:
Email:
Topic:
Body:	No Text Mark Roulo (nothanks.delete@this.xxx.com) on 2024-06-04 14:41 UTC wrote: > Mark Heath (none.delete@this.none.none) on June 4, 2024 6:32 am wrote: > > none (none.delete@this.none.com) on June 3, 2024 6:22 am wrote: > > > Mark Heath (none.delete@this.none.none) on June 3, 2024 6:16 am wrote: > > > [...] > > > > Why wouldn't HPC use this GPU instead of SME, especially since > > > > the GPU has more DRAM/SLC bandwidth than a P-core cluster? > > > > > > That wouldn't work for the HPC workloads that need FP64. That's why high-end GPU for HPC have > > > much more FP64 performance than high-end consumer GPU. > > > > You're right that some calculations need FP64. One example I know about is the final iterations > > of the self-consistent field calculation in quantum chemistry. The initial iterations > > can be done in FP32 but the last few iterations need to be in FP64. Admittedly, quantum > > chemistry calculations are not something a lot of Apple customers would do. > > > > I don't know how much faith to have in a website named CPU Monkey, but if this website is correct, > > Apple's M3 Max GPU has 3.55 TFLOPs of FP64. Two P-clusters of SME/AMX provide 1 TFLOPs of FP64. > > It therefore appears that Apple's GPU has significantly more FP64 performance than the SME/AMX > > units. The ratio of FP32 to FP64 performance in both the GPU and SME/AMX units is 4. This suggests > > Apple is using four FP32 multipliers to make one FP64 multiplier. > > > > https://www.cpu-monkey.com/en/igpu-apple_m3_max_40_core > > > > Googling suggests that the Apple GPUs don't have fp64 hardware support > at all. I see at least one github project to provide fp64 emulation. > > I can't find any definitive answer from an Apple site. There > might be an answer somewhere here: https://rosenzweig.io/ > >
Explain 🐈🐕:	(no spaces, 6 letters, lowercase)

FP64 in Apple GPU

Editor’s Picks

3D Integration: A Revolution in Design

AMD’s Bulldozer Microarchitecture

Why Apple Won’t ARM the MacBook