Virtually tagged L1-caches

By: --- (---.delete@this.redheron.com), September 8, 2021 1:28 pm
Room: Moderated Discussions
sr (nobody.delete@this.nowhere.com) on September 8, 2021 11:35 am wrote:
> Hugo Décharnes (hdecharn.delete@this.outlook.fr) on September 8, 2021 10:46 am wrote:
> > You just repeated what I said, that is, VIVT is not the way to go.
> >
> > BTW, L2 access time is critical, and TLB coverage (not accuracy) does not compensate, in that most
> > workload are bounded by the cache latency, not the TLB coverage. Yes, VIVPT enables to move TLB away
> > from the time-critical load-return path, enabling bigger TLBs. But you must reduce the L2 hit latency
> > as much as possible. We are talking about percents of performance for a typical big core.
>
> Point for VIVT L1 is that there's zero need to burn power for TLB translation for L1 hit.
> Data already in cache can accessed and verified without need to translate address. As CPU
> core can do many L1-operations per cycle at least for L1-data TLB needs to be multiported,
> where instead L2 can service only one request per clock so single-ported TLB will be fine.

The claim that a highly performant TLB must be multi-ported is a myth.
The theory is explained here: https://eprints.soton.ac.uk/347147/1/__userfiles.soton.ac.uk_Users_spd _mydesktop _MALEC.pdf

The essential insight is that
- most memory requests are strongly clustered to a single page or a few pages
- to the extent that requests are spread over many pages, that same code is probably not exactly latency essential to the single cycle level

----------------------

It will (of course) come as no surprise that this is exactly what Apple implements.
The Apple D TLB is single-ported, supporting 4 piggyback ports. Hence a single TLB request can service up to four memory requests (read and write mix) in a single cycle.

The TLB is (as best my experimental probing can tell) preceded by four queues, defined by the lowest two bits of the virtual page, and holding up to three or four entries.
Hence the flow is that memory requests (for M1 there can be up to 4 per cycle) are sorted into these queues and, as a practical matter, this pretty much gathers together multiple requests targeting the same page. Servicing queues appears to be either by oldest queue or round robin (my guess is round robin is easier and no practical difference in performance).

Clearly you can create a pathological stream that will behave badly given this design, running at 1 lookup per cycle (I did so as part of my tests). But with almost any realistic pattern, you get three or four lookups per cycle, at a worst case of one to three cycle latency as the four different queues are serviced, and with up to three elements waiting in a queue, all serviced together. I could definitely hit cases where the average number of items serviced was around 2.8 per cycle (ie a mix of pages each cycle, so three same-page references being sorted to the same queue and then being serviced together).

(The same sort of insights [requests cluster to individual cache lines] make naive cache banking work much less well than you might expect, because multiple requests in a given cycle land up routing to the same line.
Apple use the same sort of design for the D-cache, split into two banks by the lowest bit of the cache lineID, with piggyback ports allowing a single line lookup to service multiple requests, and with the two banks providing effectively a dual-porting.)

---------------

For the I-cache, if you care, the design is rather different, with both the physical page number and the cache linedID are preserved from one cycle to the next in temporary register storage, so that most cycles both their timing hit and energy hit can be avoided.
Once again, of course, it works because of strong locality (even more so with 16kiB pages).

Finally I cannot figure out how Apple bypasses D- way prediction, but they don't seem to need it -- at least even the most random stream of addresses (in terms of hitting different ways) that I could generate never seemed to cause even a hiccup in terms of invalid way prediction. I have a few hypotheses, but no strong leads, and no patents suggest an explanation.
On the I- side it's easier -- way info is stored in various appropriate places, like, for example, the Next Fetch Predictor -- and this is described in patents.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
POWER10 SAP SD benchmarkanon22021/09/06 02:36 PM
  POWER10 SAP SD benchmarkDaniel B2021/09/07 01:31 AM
    "Cores" (and SPEC)Rayla2021/09/07 06:51 AM
      "Cores" (and SPEC)anon2021/09/07 02:56 PM
  POWER10 SAP SD benchmarkAnon2021/09/07 02:24 PM
    POWER10 SAP SD benchmarkAnon2021/09/07 02:27 PM
  Virtually tagged L1-cachessr2021/09/08 04:49 AM
    Virtually tagged L1-cachesdmcq2021/09/08 07:22 AM
      Virtually tagged L1-cachessr2021/09/08 07:56 AM
      Virtually tagged L1-cachesHugo Décharnes2021/09/08 07:58 AM
        Virtually tagged L1-cachessr2021/09/08 09:09 AM
          Virtually tagged L1-cachesHugo Décharnes2021/09/08 09:46 AM
            Virtually tagged L1-cachessr2021/09/08 10:35 AM
              Virtually tagged L1-cachesHugo Décharnes2021/09/08 11:23 AM
                Virtually tagged L1-cachessr2021/09/08 11:40 AM
                  Virtually tagged L1-cachesanon2021/09/09 02:16 AM
                    Virtually tagged L1-cachesKonrad Schwarz2021/09/10 04:19 AM
                      Virtually tagged L1-cachesHugo Décharnes2021/09/10 05:59 AM
                        Virtually tagged L1-cachesanon2021/09/14 02:17 AM
                          Virtually tagged L1-cachesdmcq2021/09/14 08:34 AM
                            Or use a PLB (NT)Paul A. Clayton2021/09/14 08:45 AM
                              Or use a PLBLinus Torvalds2021/09/14 02:27 PM
                                Or use a PLBanon2021/09/14 11:15 PM
                                  Or use a PLBMichael S2021/09/15 02:21 AM
                                    Or use a PLBdmcq2021/09/15 02:42 PM
                                      Or use a PLBKonrad Schwarz2021/09/16 03:24 AM
                                        Or use a PLBMichael S2021/09/16 09:13 AM
                                          Or use a PLB---2021/09/16 12:02 PM
                                  PLB referencePaul A. Clayton2021/09/18 01:35 PM
                                    PLB referenceMichael S2021/09/18 03:14 PM
                                      Demand paging/translation orthogonalPaul A. Clayton2021/09/19 06:33 AM
                                        Demand paging/translation orthogonalMichael S2021/09/19 08:10 AM
                                      PLB referenceCarson2021/09/20 09:19 PM
                                    PLB referencesr2021/09/20 05:02 AM
                                      PLB referenceMichael S2021/09/20 06:03 AM
                                        PLB referenceLinus Torvalds2021/09/20 11:10 AM
                                  Or use a PLBsr2021/09/20 03:32 AM
                              Or use a PLBsr2021/09/21 08:36 AM
                                Or use a PLBLinus Torvalds2021/09/21 09:04 AM
                                  Or use a PLBsr2021/09/21 09:48 AM
                                    Or use a PLBLinus Torvalds2021/09/21 12:55 PM
                                      Or use a PLBsr2021/09/22 05:55 AM
                                        Or use a PLBrwessel2021/09/22 06:09 AM
                                        Or use a PLBLinus Torvalds2021/09/22 10:50 AM
                                          Or use a PLBsr2021/09/22 12:00 PM
                                            Or use a PLBdmcq2021/09/22 03:07 PM
                                            Or use a PLBEtienne Lorrain2021/09/23 07:50 AM
                                          Or use a PLBanon22021/09/22 03:09 PM
                                            Or use a PLBdmcq2021/09/23 01:35 AM
                                          Or use a PLB2021/09/23 08:37 AM
                                            Or use a PLBLinus Torvalds2021/09/23 11:01 AM
                                              Or use a PLBgpd2021/09/24 02:59 AM
                                                Or use a PLBLinus Torvalds2021/09/24 09:45 AM
                                                  Or use a PLBdmcq2021/09/24 11:43 AM
                                                  Or use a PLBsr2021/09/25 09:19 AM
                                                    Or use a PLBLinus Torvalds2021/09/25 09:44 AM
                                                      Or use a PLBsr2021/09/25 10:11 AM
                                                        Or use a PLBLinus Torvalds2021/09/25 10:31 AM
                                                          Or use a PLBsr2021/09/25 10:52 AM
                                                            Or use a PLBLinus Torvalds2021/09/25 11:05 AM
                                                              Or use a PLBsr2021/09/25 11:23 AM
                                                                Or use a PLBrwessel2021/09/25 02:29 PM
                                                                  Or use a PLBsr2021/09/30 11:22 PM
                                                                    Or use a PLBrwessel2021/10/01 05:19 AM
                                                                      Or use a PLBDavid Hess2021/10/01 09:35 AM
                                                                        Or use a PLBrwessel2021/10/02 03:47 AM
                                                                      Or use a PLBsr2021/10/02 10:16 AM
                                                                        Or use a PLBrwessel2021/10/02 10:53 AM
                                                          Or use a PLBLinus Torvalds2021/09/25 10:57 AM
                                                            Or use a PLBsr2021/09/25 11:07 AM
                                                              Or use a PLBLinus Torvalds2021/09/25 11:21 AM
                                                                Or use a PLBsr2021/09/25 11:40 AM
                                                                  Or use a PLBnksingh2021/09/27 08:07 AM
                                                          Or use a PLB2021/09/27 08:02 AM
                                                            Or use a PLBLinus Torvalds2021/09/27 09:20 AM
                                                              Or use a PLBLinus Torvalds2021/09/27 11:58 AM
                                                                Or use a PLBdmcq2021/09/28 09:59 AM
                                              Or use a PLBsr2021/09/25 09:34 AM
                                                Or use a PLBrwessel2021/09/25 02:44 PM
                                                  Or use a PLBsr2021/10/01 12:04 AM
                                                    Or use a PLBrwessel2021/10/01 05:33 AM
                                                      I386 segmentation highlightssr2021/10/04 06:53 AM
                                                        I386 segmentation highlightsAdrian2021/10/04 08:53 AM
                                                          I386 segmentation highlightssr2021/10/04 09:19 AM
                                                        I386 segmentation highlightsrwessel2021/10/04 03:57 PM
                                                          I386 segmentation highlightssr2021/10/05 10:16 AM
                                                            I386 segmentation highlightsMichael S2021/10/05 11:27 AM
                                                            I386 segmentation highlightsrwessel2021/10/05 03:20 PM
                                                Or use a PLBJohnG2021/09/25 09:18 PM
                                              Or use a PLB2021/09/27 06:37 AM
                                                Or use a PLBHeikki Kultala2021/09/28 02:53 AM
                                                  Or use a PLBrwessel2021/09/28 06:29 AM
                                        Or use a PLBDavid Hess2021/09/23 05:00 PM
                                          Or use a PLBAdrian2021/09/24 12:21 AM
                                            Or use a PLBdmcq2021/09/25 11:41 AM
                                        Or use a PLBblaine2021/09/26 10:19 PM
                                          Or use a PLBDavid Hess2021/09/27 10:35 AM
                                            Or use a PLBblaine2021/09/27 04:19 PM
                                            Or use a PLBAdrian2021/09/27 09:40 PM
                                              Or use a PLBAdrian2021/09/27 09:59 PM
                                                Or use a PLBdmcq2021/09/28 06:45 AM
                                              Or use a PLBrwessel2021/09/28 06:45 AM
                                              Or use a PLBDavid Hess2021/09/28 11:50 AM
                                                Or use a PLBEtienne Lorrain2021/09/30 12:25 AM
                                                  Or use a PLBDavid Hess2021/10/01 09:40 AM
                                  MMU privilegessr2021/09/21 10:07 AM
                                    MMU privilegesLinus Torvalds2021/09/21 12:49 PM
                            Virtually tagged L1-cachesKonrad Schwarz2021/09/16 03:18 AM
                          Virtually tagged L1-cachesCarson2021/09/16 12:12 PM
                            Virtually tagged L1-cachesanon22021/09/16 04:16 PM
                              Virtually tagged L1-cachesrwessel2021/09/16 05:29 PM
                          Virtually tagged L1-cachessr2021/09/20 03:20 AM
              Virtually tagged L1-caches---2021/09/08 01:28 PM
                Virtually tagged L1-cachesanonymou52021/09/08 07:28 PM
                  Virtually tagged L1-cachesanonymou52021/09/08 07:34 PM
                  Virtually tagged L1-caches---2021/09/09 09:14 AM
                    Virtually tagged L1-cachesanonymou52021/09/09 09:44 PM
                Multi-threading?David Kanter2021/09/09 08:32 PM
                  Multi-threading?---2021/09/10 08:19 AM
                Virtually tagged L1-cachessr2021/09/11 12:19 AM
                Virtually tagged L1-cachessr2021/09/11 12:36 AM
                  Virtually tagged L1-caches---2021/09/11 08:53 AM
                    Virtually tagged L1-cachessr2021/09/11 11:43 PM
                      Virtually tagged L1-cachesLinus Torvalds2021/09/12 10:10 AM
                        Virtually tagged L1-cachessr2021/09/12 10:57 AM
                          Virtually tagged L1-cachesdmcq2021/09/13 07:31 AM
                            Virtually tagged L1-cachessr2021/09/20 03:11 AM
            Virtually tagged L1-cachessr2021/09/11 01:49 AM
      Virtually tagged L1-cachesLinus Torvalds2021/09/08 11:34 AM
        Virtually tagged L1-cachesdmcq2021/09/09 01:46 AM
          Virtually tagged L1-cachesdmcq2021/09/09 01:58 AM
          Virtually tagged L1-cachessr2021/09/11 12:29 AM
            Virtually tagged L1-cachesdmcq2021/09/11 07:59 AM
              Virtually tagged L1-cachessr2021/09/11 11:57 PM
                Virtually tagged L1-cachesdmcq2021/09/12 07:44 AM
                  Virtually tagged L1-cachessr2021/09/12 08:48 AM
                    Virtually tagged L1-cachesdmcq2021/09/12 12:22 PM
                      Virtually tagged L1-cachessr2021/09/20 03:40 AM
    Where do you see this information? (NT)anon22021/09/09 01:45 AM
      Where do you see this information?sr2021/09/11 12:40 AM
        Where do you see this information?anon22021/09/11 12:53 AM
          Where do you see this information?sr2021/09/11 01:08 AM
            Thank you (NT)anon22021/09/11 03:31 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?