New article: AMD's Jaguar Microarchitecture

Article: AMD's Jaguar Microarchitecture
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), April 2, 2014 5:02 pm
Room: Moderated Discussions
Are any details available on the neural network branch predictor (aside from using 26 bits of history)?

The BTB design seems unusual. Providing only an offset with the same page seems unusual (it makes me wonder if there is some TLB energy use optimization involved; a hit in the ordinary BTBs [not the out-of-page target array] could use a 1-entry microTLB [which would be "tagged" by the page number of the current IP, so be effectively tagless]). A compiler might optimize loops to be within a page and most optimized forward branches are biased toward not taken (and less biased forward branches probably tend to be short and so likely to be within the same page; also, large-bodied loops which are more likely to jump outside of the page are less likely to be impacted by one branch's excess delay [four cycles] finding a target), so this power/area optimization probably does not have a significant performance impact.

I assume "Near calls and the associated returns are predicted by a 16 entry Return Address Stack (RAS)" actually meant "returns associated with near calls are ..."

Since called functions would usually start in a different page than the call instruction, it looks like at most only 32 more-distant direct function targets would be predicted. This seems a bit odd, but with the modest sized BTBs (only 2 Ki entries for direct branches/jumps) this may be more reasonable than it seems at first.

(By the way, no information on the L2 instruction TLBs is given. A unified L2 might not be unreasonable for a lower performance core especially given the size given for the L2 DTLB [512 4KiB, 256 2MiB]. [{Rant begin}I am still disappointed that no one seems to think it worthwhile to use the same entries to cache the 2MiB PTEs and PDEs. Admittedly, such wastes 10 bits—1 'is_page' bit and 9 physical address bits, though that waste might be reduced by cleverness—and pressures for higher associativity, but the utilization of a larger structure would be higher. Bit waste did not seem to keep designers from implementing a page-size-unified L2 TLB where an additional 9 tag bits are unused for 2MiB pages.{Rant end}])

Restricting Instruction Byte Buffer read to the oldest 16B chunk and 6B of the next chunk also seems likely to force a few hiccups (c. 6% of the time an 8B pair of instructions would allow only one to be decoded), but the performance loss is probably acceptable for the simplification/lower power. (I wonder if AMD considered supporting a template allowing a simple move or compare not to be counted as an instruction; combined with move elimination and fusing compares and branches such might provide a modest performance/power benefit. On the other hand, with a limited budget such small optimizations [if it would be an optimization] are probably not worth pursuing.)

The higher latency L2 cache also seems a bit disappointing. Sharing L2 across four cores does improve utilization at low thread count, potentially increases capacity (if there is sharing of cache lines), potentially speeds coherence (cache line ping-pong is limited to evicting from L1 caches when only cores in a cluster share a cache line), and would be more friendly to using different clock rates for different cores while allowing coherence to be fast (similar reasoning for Intel moving its L3 slices into a separate clock domain from the associated cores; Jaguar's L2 is last-level cache). A unified L2 may also be more friendly to size adjustment (1 MiB among 4 cores might be okay where 256 KiB per core might be more painful). Even so, I dislike the relatively high latency (but that is just a feeling from an enthusiast).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New article: AMD's Jaguar MicroarchitectureDavid Kanter2014/04/01 01:19 AM
  New article: AMD's Jaguar MicroarchitectureSHK2014/04/01 06:09 AM
    New article: AMD's Jaguar MicroarchitectureJeff Rupley2014/04/01 07:13 PM
      New article: AMD's Jaguar MicroarchitectureSHK2014/04/02 06:45 AM
        CMOV is 3 operand given register renamingPaul A. Clayton2014/04/02 09:11 AM
          CMOV is 3 operand given register renamingSHK2014/04/02 12:17 PM
            Limited operand tags in issue queue entriesPaul A. Clayton2014/04/02 01:32 PM
        New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 12:48 PM
          New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 02:32 PM
  New article: AMD's Jaguar MicroarchitectureGeorge2014/04/01 02:10 PM
  New article: AMD's Jaguar Microarchitecturewillmore2014/04/01 06:37 PM
    New article: AMD's Jaguar Microarchitecturewillmore2014/04/01 07:08 PM
    New article: AMD's Jaguar MicroarchitectureNaN2014/04/02 08:58 AM
      New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/04 07:16 AM
        New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 08:54 AM
          New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/04 11:45 AM
            New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 02:00 PM
              New article: AMD's Jaguar MicroarchitectureNoSpammer2014/04/04 03:15 PM
              New article: AMD's Jaguar MicroarchitectureTREZA2014/04/04 03:18 PM
                New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 04:56 PM
                  New article: AMD's Jaguar MicroarchitectureTREZA2014/04/04 05:34 PM
                  New article: AMD's Jaguar MicroarchitectureMichael S2014/04/05 11:02 AM
                  New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/05 06:50 PM
                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 01:22 AM
                    New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 05:29 AM
                      New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/06 07:33 AM
                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 03:12 AM
                          New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 06:58 AM
                        New article: AMD's Jaguar MicroarchitectureEduardoS2014/04/07 04:34 PM
                      New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/06 07:53 AM
                      New article: AMD's Jaguar MicroarchitectureMegol2014/04/06 08:21 AM
                        New article: AMD's Jaguar Microarchitecturenone2014/04/06 09:07 AM
                          New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 09:23 AM
                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 02:48 PM
                          New article: AMD's Jaguar MicroarchitectureTREZA2014/04/06 03:47 PM
                            New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 02:34 AM
                              New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 03:27 AM
                                New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 05:39 AM
                                  New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/07 12:26 PM
                                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 01:42 PM
                                    New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 01:50 PM
                                      New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/07 02:11 PM
                                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 05:44 PM
                                      New article: AMD's Jaguar MicroarchitectureTREZA2014/04/07 03:38 PM
              denormal on IvyB and HaswellMichael S2014/04/05 10:45 AM
                Forum searchiz2014/04/05 12:54 PM
                denormal on IvyB and HaswellLinus Torvalds2014/04/06 09:55 AM
                  denormal on IvyB and HaswellMichael S2014/04/17 06:43 PM
            New article: AMD's Jaguar Microarchitecturedmcq2014/04/05 06:52 AM
            New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/05 10:38 AM
              New article: AMD's Jaguar MicroarchitectureMichael S2014/04/05 10:59 AM
                New article: AMD's Jaguar MicroarchitectureBrett2014/04/05 12:12 PM
                  New article: AMD's Jaguar MicroarchitectureEduardoS2014/04/05 12:29 PM
                    New article: AMD's Jaguar MicroarchitectureBrett2014/04/05 01:00 PM
                      New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 02:18 AM
                        New article: AMD's Jaguar MicroarchitectureBrett2014/04/06 10:08 AM
                          New article: AMD's Jaguar MicroarchitectureBrett2014/04/06 10:11 AM
                New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/05 06:01 PM
                  New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 01:50 AM
                    New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/06 03:52 PM
                      New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 02:20 AM
                        New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/07 10:38 AM
                          New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 10:47 AM
                            New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/07 02:52 PM
                              New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 04:01 PM
                                New article: AMD's Jaguar MicroarchitectureSeni2014/04/08 02:03 PM
                                  New article: AMD's Jaguar MicroarchitectureWilco2014/04/08 02:56 PM
                                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/08 04:05 PM
                                      New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/08 06:55 PM
                                        New article: AMD's Jaguar MicroarchitectureMichael S2014/04/09 01:12 AM
                  New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 04:51 AM
  New article: AMD's Jaguar MicroarchitectureWaltC2014/04/02 01:52 PM
    New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 02:25 PM
      New article: AMD's Jaguar Microarchitectureitsmydamnation2014/04/03 12:19 AM
      New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/09 01:44 PM
        New article: AMD's Jaguar MicroarchitectureDavid Kanter2014/04/10 11:24 PM
          New article: AMD's Jaguar Microarchitecturenone2014/04/11 01:49 AM
          New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/11 09:14 AM
    New article: AMD's Jaguar MicroarchitectureRyan Dean2014/04/03 01:04 AM
  New article: AMD's Jaguar MicroarchitecturePaul A. Clayton2014/04/02 05:02 PM
  New article: AMD's Jaguar MicroarchitectureRicky Chan2014/04/03 07:50 AM
    New article: AMD's Jaguar Microarchitecturesomeone2014/04/04 07:18 AM
  New article: AMD's Jaguar Microarchitecturebakaneko2014/04/09 03:08 PM
    New article: AMD's Jaguar MicroarchitectureTREZA2014/04/09 05:34 PM
  Jaguar's detailsHugo Décharnes2014/06/07 04:08 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?