Please don't use the MT graphs

By: Andrei F (andrei.delete@this.anandtech.com), November 10, 2021 3:13 am
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on November 9, 2021 9:31 pm wrote:
> Ganon (anon.delete@this.gmail.com) on November 9, 2021 7:02 pm wrote:
> > --- (---.delete@this.redheron.com) on November 9, 2021 1:39 pm wrote:
> > > I have no idea how many people here read my (ongoing, the public version is only
> > > version 0.7) exegesis of M1 internals. But those who have read the entire thing
> > > (all 300+ pages!) will remember an on-going bafflement regarding the L1 cache.
> > >
> >
> >
> > Thoroughly enjoyed the read; looking forward to the next update. Regarding
> > m1 pro/max; seems some things have changed at least according to
> >
> > https://www.anandtech.com/show/17024/apple-m1-max-performance-review/2
> >
> > where a single core has >100GB/s all the way from L1 to DRAM; even better
> > than M1.
>
>
> That graph seems to be measuring something different from the first Anandtech graph, the graph for the M1.
> The M1 graph was, as far as I can tell, for pure *load* performance (at least that's my case that it matches
> most closely), and that's at least the most obvious case when you look at the Intel graphs I referenced.
>

Both the M1 and M1 Max are the same test.

Please don't use my MT graphs for detailed analysis in the L1, it's a multi-threaded test and has overhead at small depths and you can't get detailed data there.

https://i.imgur.com/dL7s5vX.png

That would be a more accurate showcase on the M1 Max for example.

The test works in 64B chunks, doesn't matter for pure LD or pure ST,

Flip = copy/alter from one region to another (flip the memory in 8B elements around)
CLflip = flip 8B elements within 64B chunks around, read and write to same cachelines

In terms of the store bandwidth, you can't properly measure it as the fabric transforms write into non-temporal ones. We've had this before with >A76 cores where the store bandwidth is almost 100% of theoretical.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Detailed investigation of M1 load and store bandwidths from L1 out to DRAM---2021/11/09 02:39 PM
  Detailed investigation of M1 load and store bandwidths from L1 out to DRAMGanon2021/11/09 08:02 PM
    Detailed investigation of M1 load and store bandwidths from L1 out to DRAM---2021/11/09 10:31 PM
      Please don't use the MT graphsAndrei F2021/11/10 03:13 AM
        Please don't use the MT graphs---2021/11/10 10:26 AM
          Followup for Andrei---2021/11/10 06:43 PM
            Followup for AndreiAndrei F2021/11/11 02:30 AM
              Followup for Andrei---2021/11/11 10:21 AM
                Followup for AndreiChester2021/11/11 03:27 PM
                  Followup for Andrei---2021/11/11 03:57 PM
  Detailed investigation of M1 load and store bandwidths from L1 out to DRAMChester2021/11/09 08:26 PM
    Detailed investigation of M1 load and store bandwidths from L1 out to DRAM---2021/11/09 10:37 PM
      Detailed investigation of M1 load and store bandwidths from L1 out to DRAMChester2021/11/10 03:12 AM
    Detailed investigation of M1 load and store bandwidths from L1 out to DRAMAndrei F2021/11/10 04:12 AM
      Thanks for the dataChester2021/11/10 11:17 AM
        Thanks for the dataAndrei F2021/11/10 01:52 PM
          Thanks for the dataChester2021/11/11 12:16 AM
            Thanks for the dataAndrei F2021/11/11 02:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊