The (wrong) state of trace caches on modern CPUs

By: Michael S (already5chosen.delete@this.yahoo.com), August 25, 2016 9:36 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on August 25, 2016 9:50 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on August 25, 2016 3:28 am wrote:
> > Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on August 25, 2016 2:38 am wrote:
> > > > "
> > > > With the Intel Sandybridge [45] processor, Intel introduced
> > > > a μop cache instead of a loop buffer. μop caches
> > > > tradeoff some of the power efficiency of loop caches in exchange
> > > > for capturing more instructions and behaviors.
> > > > Thus codes which frequent and simple loops may be better
> > > > served by a traditional loop cache, however μop caches
> > > > are more robust and able to derive benefit more irregular
> > > > codes. Essentially, μop caches operate as traditional
> > > > caches which hold decoded instructions. However, they share
> > > > some characteristics with loop caches. [b]In current
> > > > commercial implementations, μop caches encode predicted branch
> > > > paths.[/b] If branch paths differ from previously
> > > > predicted paths, like loop caches the μop cache must be flushed and refilled.
> > > > "
> > >
> > > well, the above excerpt contains roughly one error per
> > > line, it looks like this PhD was not properly reviewed
> > >
> >
> > His linkedin profile rather impressive for a young man.
> > Including:
> >
Engineering Intern
> > Intel Corporation
> > May 2010 – December 2010 (8 months)
> >
> > • Worked on the Haswell performance modeling team.
> > • Performed early stage feature evaluation for a future generation processor..
> > • Extensive C++ coding on an out of order processor simulator.

> >
> > As a non-native English speaker I see a grammar of above quote as problematic, but may be for natives
> > it's o.k? Or, may be, there are typing mistakes, like "Thus codes which frequent" must be "Thus codes
> > with frequent" or "derive benefit more irregular" should be "derive benefit for more irregular"
> >
> > But yes, SB has *both* Decoded ICache and ability to utilize Micro-op Queue as small loop cache.
> > And yes, SB Decoded ICache stores uOps in program order rather than in predicted branch path order.
> > And yes, when "branch paths differ from previously predicted paths" Decoded ICache is *not* flushed.
> >
> >
> >
>
> I have no idea what you are saying here, and what you are agreeing or disagreeing with.
>
> ** SB has *both* Decoded ICache and ability to utilize Micro-op Queue as small loop cache.
> + Sure, I don't think any sane person was disagreeing with that.


Then Mitchell Hayenga is insane. Because he wrote that "Intel introduced a μop cache instead of a loop buffer".
Personally, I don't think that he is insane, just very inaccurate in his writing.

>
> ** SB Decoded ICache stores uOps in program order rather than in predicted branch path order.
> + I have no idea what the difference between these two statements is.
> Think about the implementation.
> How is the decoded ops cache going to work? The obvious implementation is that it stores
> decoded instructions AS THEY ARE ENCOUNTERED. In other words, they are stored in
> predicted branch path order because that is what the decoder saw. Since predicted branch path order
> equals program order when the branch prediction is working, these are usually the same thing, but the
> decoded cache pretty much HAS to store them in the order that was generated by branch prediction.
>
> And a PREDICTED straightline ordering of a stream of instructions
> is a trace, no? So what is being argued here?
>
> ** When "branch paths differ from previously predicted paths" Decoded ICache is *not* flushed.
> OK. Do you know this for a fact? And if flushing does not occur, then why not? I have no basis on which
> to judge your credibility here, but what you're saying makes no sense. A large part of the point of the
> decoded ops cache is to avoid the expense of the branch prediction machinery. That means there isn't a
> SECOND layer of branch prediction machinery operating on the decoded ops, they're just run through straightline.
> And if that straightline run-through is no longer valid (because branch prediction suggests otherwise)
> then what is the point of keeping that, for lack of a better word, trace, in the cache.
>
> Your mental model seems to be something like that the decoders scan the STATIC program code,
> translate it into some easier-to-interpret instructions, and then the complete standard CPU pipeline
> (including all the fetch logic with the branch steering that implies) runs on those easier-to-interpret
> instructions. But that's clearly not what happens, the whole system operates as I described,
> on the partial stream of instructions that has been delivered to the decoders, and with no branch
> steering mechanism that I can imagine available to the decoded cache.
>

My mental model is based on description of Sandy Bridge microarchitecture in Optimization reference manual. They don't state it with 100% certaincy, but it appears that decoder always processes fetched 16B chunks in their entirety, even when there is predicted taken conditional branch in the middle. May be, even 32B chunks, I am not sure about it.

One thing that is absolutely certain: "All micro-ops in a Way* represent instructions which are statically contiguous in the code and have their EIPs within the same aligned 32-byte region".

* Way = 1/256th of the Decoded ICache, can holds 1 to 6 micro-Ops.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Branch/jump target predictionTravis2016/08/09 09:44 AM
  Early decode of unconditional jumpsPeter Cordes2016/08/09 11:35 AM
    Early decode of unconditional jumpsExophase2016/08/09 12:29 PM
  pipelines are too long, noHeikki Kultala2016/08/09 11:37 AM
    pipelines are too long, nono name2016/08/09 06:17 PM
      pipelines are too long, noWilco2016/08/10 01:43 AM
        pipelines are too long, noPaul A. Clayton2016/08/10 07:44 PM
    Converged BTB/IcachePaul A. Clayton2016/08/10 07:44 PM
  Branch/jump target predictionsylt2016/08/10 02:27 AM
    Branch/jump target predictionPeter Cordes2016/08/12 03:23 PM
      Branch/jump target predictionsylt2016/08/12 10:35 PM
  Branch/jump target predictionMr. Camel2016/08/10 09:43 AM
    Branch/jump target predictionLinus Torvalds2016/08/10 11:46 AM
      Branch/jump target predictionMegol2016/08/10 02:25 PM
        Branch/jump target predictionLinus Torvalds2016/08/10 04:14 PM
          Branch/jump target predictionDavid Kanter2016/08/11 11:09 PM
            Branch/jump target predictionLinus Torvalds2016/08/12 11:25 AM
          Branch/jump target prediction2016/08/14 04:24 AM
            Branch/jump target predictionMaynard Handley2016/08/14 06:47 AM
              Branch/jump target predictionDavid Kanter2016/08/14 07:13 AM
              Branch/jump target prediction2016/08/16 05:19 AM
            Branch/jump target predictionTim McCaffrey2016/08/14 07:12 AM
              Branch/jump target predictionDavid Kanter2016/08/14 07:18 AM
                Branch/jump target predictionGabriele Svelto2016/08/14 01:09 PM
            Just a thoughtAnon2016/08/14 09:40 AM
              Just a thought2016/08/16 05:58 AM
                Just a thoughtAnon2016/08/16 07:45 AM
                  Just a thought2016/08/16 08:36 AM
            Branch/jump target predictionLinus Torvalds2016/08/14 09:40 AM
              Branch/jump target prediction2016/08/16 05:40 AM
                Branch/jump target predictionRicardo B2016/08/16 06:39 AM
                  Branch/jump target prediction -82016/08/16 08:23 AM
                    Branch/jump target prediction -8anon2016/08/16 09:09 AM
                    Branch/jump target prediction -8Ricardo B2016/08/16 09:33 AM
                      Branch/jump target prediction -8Exophase2016/08/16 10:02 AM
                        Branch/jump target prediction -8Ricardo B2016/08/16 10:31 AM
                        SPU hbr instruction (hint for branch)vvid2016/08/16 11:31 AM
                        Branch/jump target prediction -8no name2016/08/17 07:16 AM
                    Branch/jump target prediction -8Gabriele Svelto2016/08/16 10:46 AM
                      Branch/jump target prediction -8Etienne2016/08/17 12:27 AM
                        Branch/jump target prediction -8Gabriele Svelto2016/08/17 02:52 AM
                    Branch/jump target prediction -8Maynard Handley2016/08/18 09:02 AM
                      Branch/jump target prediction -82016/08/18 05:21 PM
                        Branch/jump target prediction -8Maynard Handley2016/08/18 06:27 PM
                          Branch/jump target prediction -8Megol2016/08/19 03:29 AM
                          Part 1/N - CPU-internal JIT2016/08/19 03:44 AM
                        Atom, you're such a comedian.Jim Trent2016/08/18 09:39 PM
                          Atom, you're such a comedian.2016/08/19 02:23 AM
                      Branch/jump target prediction -8Etienne2016/08/19 12:25 AM
                        Branch/jump target prediction -8Simon Farnsworth2016/08/19 03:17 AM
                          Branch/jump target prediction -8Michael S2016/08/19 05:39 AM
                          Branch/jump target prediction -8anon2016/08/19 06:29 AM
                            Branch/jump target prediction -8Simon Farnsworth2016/08/19 07:34 AM
                              Branch/jump target prediction -8anon2016/08/19 07:48 AM
                                Branch/jump target prediction -8Exophase2016/08/19 10:03 AM
                                Branch/jump target prediction -8Maynard Handley2016/08/19 10:34 AM
                            Branch/jump target prediction -8David Kanter2016/08/19 11:23 PM
                        Branch/jump target prediction -8Ricardo B2016/08/19 06:18 AM
                          Branch/jump target prediction -8Maynard Handley2016/08/19 07:41 AM
                            Branch/jump target prediction -8Michael S2016/08/19 08:26 AM
                              Branch/jump target prediction -8Maynard Handley2016/08/19 12:47 PM
                                Branch/jump target prediction -8Michael S2016/08/21 12:53 AM
                                  Branch/jump target prediction -8Ricardo B2016/08/22 04:17 AM
                                    Branch/jump target prediction -8Michael S2016/08/22 04:58 AM
                                      Branch/jump target prediction -8Ricardo B2016/08/22 06:50 AM
                            Branch/jump target prediction -8Simon Farnsworth2016/08/19 08:28 AM
                              Branch/jump target prediction -8Simon Farnsworth2016/08/19 08:40 AM
                            Branch/jump target prediction -8David Kanter2016/08/22 11:05 PM
                              Branch/jump target prediction -8Maynard Handley2016/08/23 06:49 AM
                      Branch/jump target prediction -8anon2016/08/26 07:00 AM
                        Branch/jump target prediction -8anon2016/08/26 07:14 AM
                Branch/jump target predictionMegol2016/08/19 03:23 AM
          Branch/jump target predictionMegol2016/08/19 06:42 AM
            Branch/jump target predictionMaynard Handley2016/08/19 10:46 AM
              Branch/jump target predictionDavid Kanter2016/08/19 11:34 PM
                Branch/jump target predictionMaynard Handley2016/08/20 06:07 AM
            Branch/jump target predictionsylt2016/08/19 10:48 AM
              Branch/jump target predictionsylt2016/08/19 11:00 AM
              Branch/jump target predictionMegol2016/08/21 09:27 AM
                The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/22 02:10 PM
                  The (apparent) state of trace caches on modern CPUsExophase2016/08/22 07:55 PM
                    The (apparent) state of trace caches on modern CPUsanon2016/08/22 11:36 PM
                      The (apparent) state of trace caches on modern CPUsExophase2016/08/23 04:08 AM
                        The (apparent) state of trace caches on modern CPUsanon2016/08/23 08:51 PM
                          The (apparent) state of trace caches on modern CPUsExophase2016/08/23 10:12 PM
                          The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/24 06:38 AM
                            The (apparent) state of trace caches on modern CPUsanon2016/08/24 07:26 PM
                    The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/23 06:48 AM
                      That's not trueDavid Kanter2016/08/23 08:39 AM
                        That's not trueMaynard Handley2016/08/23 08:56 AM
                      The (apparent) state of trace caches on modern CPUsanon2016/08/23 08:54 PM
                  The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 01:38 AM
                    The (wrong) state of trace caches on modern CPUsMichael S2016/08/25 02:28 AM
                      The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 06:12 AM
                      The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 08:50 AM
                        The (wrong) state of trace caches on modern CPUsMichael S2016/08/25 09:36 AM
                          The (wrong) state of trace caches on modern CPUsExophase2016/08/25 10:32 AM
                        The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 10:12 AM
                          The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 11:01 AM
                            The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 11:20 AM
                              The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 12:34 PM
        Branch/jump target predictionGabriele Svelto2016/08/11 12:15 PM
  Branch/jump target predictionGabriele Svelto2016/08/20 06:21 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊