Is Cinebench a totally useless benchmark?

By: Maynard Handley (name99.delete@this.name99.org), January 19, 2014 10:26 pm
Room: Moderated Discussions
Alberto (git.delete@this.git.it) on January 19, 2014 2:08 pm wrote:

> Skylake could give us some nice surprises on IPC, looking at some leaked informations :).
> Moreover Intel is claiming that Core has a 2X IPC gain on track next years. Now Intel is more free
> in this attempt, the light mobile segment will be covered by Atom and i fully believe that the Haswell
> attempt to put down the flagship arc to absurd levels of low power consumption will become useless as
> far more powerful multicore Atom SOCs will born.
>
>

Hmm. You mean Intel, after years of eaking 10% annual IPC gains, and hitting a pretty impressive 1.75, has some secret magic that will get them to 3.5 IPC next year? Seems, let's say, unlikely...

The limit literature tells us that the data flow allows an IPC of this magnitude. To actually get there we would need:
- more or less perfect I-prefetch. This is (fortunately) possible, but the literature as to how to do it was only published in 2011. Intel generally can't turn around fast enough to get a finding into production in three years.

- another ramp up in branch prediction, from the current about 98% accurate to about 99% accurate. This is probably doable with the newest branch predictors, like TAGE --- but again, Intel turnaround time.

- a ramp up in D-prefetch performance. As far as I know "we", the outside Intel community, know very little about how Intel currently structures their D-prefetch. Obviously they are using the basic tools like multiple strided pre fetchers per core. Probably they have some of the smarts suggested ten or so years ago (how to arbitrate between prefetch streams from different cores at the memory controller, where in the LRU/MRU ways to place streams depending on degree of confidence and usage patterns). Do they have some of the newest ideas (like having prefetch streams aware of/scheduled against DRAM layout, or having pre fetchers running at multiple cache levels, but co-ordinated with each other)?

- even with the three above ideas well implemented, they're not going to get maybe more than 15% over what they have today. To get a 2x boost, they have to get much closer to being data flow limited, which means much much larger windows. Which means a very different OoO architecture, because the classic OoO architecture they are using today doesn't scale to those sizes. I've mentioned this sort of thing before. The easy version is Runahead processing, which gets you a lot of the memory latency benefit but is suboptimal. The more aggressive version is some form of kilo-instruction processing or continuous flow processing.

It would be a marvelous thing for Intel to push KIP/CFP, and, as I have mentioned, the CFP work is published by people affiliated with Intel. But it seems unlikely that it will happen as rapidly as is claimed here.
And the larger implication (that this is the savior of Intel) is, I think, unrealistic. AMD may not be able to ship a KIP-class processor (god knows WTF they are thinking, but they seem
unable to ship standard well-known improvements to an ancient OoO architecture). But Apple are aware of these techniques (as I've again mentioned, their "macro scalar" patents are a version of the KIP/CFP ideas) and I'd like to think at least one other ARM player is marginally competent [though that may be optimistic given Qualcomm's recent behavior and what I've seen of Denver]. The other interesting player (especially if these ideas are supposed to be embedded in Intel's "no longer constrained by absurd levels of low power consumption" CPUs) is IBM who, I imagine, if they had reason to could fairly easily switch POWER from its current (not especially performant) SMT4 model to a rather more performant SMT2+KIP/CFP model.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some cinebench scores and IPCTimothy McCaffrey2014/01/17 09:27 PM
  Many Thanks :) (NT)Alberto2014/01/18 02:12 AM
  Thanks! :-) (NT)Poindexter2014/01/19 04:46 AM
  Is Cinebench a totally useless benchmark?slacker2014/01/19 11:47 AM
    Is Cinebench a totally useless benchmark?Brett2014/01/19 12:48 PM
      Is Cinebench a totally useless benchmark?Alberto2014/01/19 02:08 PM
        Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:48 PM
          Is Cinebench a totally useless benchmark?Alberto2014/01/20 02:15 AM
            Is Cinebench a totally useless benchmark?Exophase2014/01/20 02:45 AM
              Is Cinebench a totally useless benchmark?Alberto2014/01/21 04:42 AM
                Is Cinebench a totally useless benchmark?Exophase2014/01/21 08:10 AM
                  Monopolies holding back advancementsDoug S2014/01/21 06:04 PM
                    Monopolies holding back advancementsMaxwell2014/01/22 08:00 AM
                      Monopolies holding back advancementsDoug S2014/01/22 11:31 PM
                        Moore's Law provided Planned Obsolescencehobold2014/01/23 01:31 AM
                          Moore's Law provided Planned ObsolescenceDoug S2014/01/23 08:54 PM
                            Moore's Law provided Planned Obsolescencehobold2014/01/24 03:02 AM
                              Moore's Law provided Planned ObsolescenceDoug S2014/01/24 01:18 PM
        Is Cinebench a totally useless benchmark?Maynard Handley2014/01/19 10:26 PM
          Is Cinebench a totally useless benchmark?Exophase2014/01/19 11:01 PM
            Is Cinebench a totally useless benchmark?Maynard Handley2014/01/20 03:25 AM
              Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 10:13 AM
                Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 12:31 PM
          Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 09:19 AM
            Intel and branch predictionDavid Kanter2014/01/21 10:26 AM
              Intel and branch predictionMaynard Handley2014/01/21 08:52 PM
                Intel and branch predictionMaynard Handley2014/01/21 09:14 PM
                No dynamic predication yet, I suspectPaul A. Clayton2014/01/21 10:04 PM
                  No dynamic predication yet, I suspectExophase2014/01/22 12:29 AM
                    No dynamic predication yet, I suspectdmcq2014/01/22 05:24 AM
                    No dynamic predication yet, I suspectPatrick Chase2014/01/22 11:36 PM
                      No dynamic predication yet, I suspectMaynard Handley2014/01/23 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 11:59 AM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:01 PM
                          16 misses per core on Haswell?David Kanter2014/01/23 06:10 PM
                            16 misses per core on Haswell?Patrick Chase2014/01/23 08:12 PM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/27 06:34 PM
                            Fixed link to paperPaul A. Clayton2014/01/28 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:29 PM
                      SMT influence on ROB size?Paul A. Clayton2014/01/23 11:26 AM
                        SMT influence on ROB size?Patrick Chase2014/01/23 08:40 PM
    Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:44 PM
    Is Cinebench a totally useless benchmark?anon2014/01/19 08:43 PM
      Is Cinebench a totally useless benchmark?Timothy McCaffrey2014/01/20 04:24 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?