Are you kidding?

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), October 7, 2015 4:49 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on October 5, 2015 12:24 pm wrote:
>
> With Skylake, it's a little complex - I can see an argument for 6-wide (assuming
> hit in the uop cache) or 5-wide. Both of those are sustainable, although in the
> case of I$ hits, I'm not sure there is really enough fetch bandwidth.

I'd be very surprised if Skylake is "5-wide decode" in the sense that most people think of it and it seems to be talked about here.

In fact, Intel doesn't even say it's 5-wide. Intel says it's "5 uops max". It's possible that that never means "five instructions" - it might be that you only get five uops when one or more of the decoders end up decoding an instruction into multiple uops.

For example, I'd love to see better decoding of read-modify-write instructions, but on Intel big-cores those are traditionally two uops, and they only get decoded by the first decoder. Even though from a decoding standpoint, the read-modify-write instructions are not at all any harder to decode than the normal load-op instructions. It's the exact same modrm format.

So it might well be that Skylake just extends the second decoder that used to only generate a single uop to also be able to emit two uops, so that you can decode two of those memory op instructions in the same cycle. Really, from the standpoint of just parsing the instruction bytes in memory, there is no difference between "add memory to register" and "add register to memory". The only difference is in the uops they result in.

So the "one more uop than Haswell" by no means needs to mean "one more instruction". It could just as easily (in fact, I think more easily) be about just making some of the decoders a bit more flexible.

There are other limits to the x86 decoders that tend to be more painful than "4 instructions". Intel used to have something like a 16-byte total size decode limit. I have some memory of that being extended to 32 bytes from the pure "fetch from L1 I$" standpoint, but there's the whole instruction re-alignment and predecode issue, and here were limits there just how many bytes the ostensibly four instructions could be.

Depending on what the exact rules are, again it might be much more productive to increase those kinds of limits rather than try to go from four instructions to five. Since branch targets are often not aligned, it can be a big deal if you can fetch a full 64 byte cacheline in one chunk and re-align it, because you're then more likely to get the full theoretical three or four instructions decoded after a mispredicted branch.

(And you don't need to do a full unaligned byte shifter for instruction decode - even if you fetch 64 bytes at a time, maybe you'll only align something like 16-24 bytes of them into the decode buffer in order to capture that "likely next four instructions" data)

Side note: I find it interesting that Skylake apparently fixes the "prefetch NULL" problem.

In list handling, you often want to blindly prefetch the next pointer (trying to make it conditional on being valid would just increase the overhead of prefetching to the point where it hurts more than it helps due to the inevitable branch misprediction at the end of the chain traversal), and almost every architecture I've seen gets this wrong, taking TLB or memory pipeline resources for the NULL case. Which is absolutely horrible. We've found prefetching to basically never be a win in real life because the cost of the prefetch is too high.

That may be something we might want to look at in the kernel if Skylake fixed prefetch.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Update to Intel Optimization ManualSHK2015/09/29 05:38 AM
  gather speedEric Bron2015/09/29 09:43 AM
    gather speedGabriele Svelto2015/09/29 12:00 PM
  Update to Intel Optimization ManualTim McCaffrey2015/09/29 11:18 AM
    Update to Intel Optimization ManualSHK2015/09/29 12:04 PM
      Update to Intel Optimization ManualAnon2015/09/29 02:23 PM
    Update to Intel Optimization Manualnone2015/09/29 10:31 PM
      Update to Intel Optimization ManualMichael S2015/09/30 04:24 AM
    Update to Intel Optimization ManualMichael S2015/09/30 04:30 AM
      Update to Intel Optimization ManualTim McCaffrey2015/09/30 10:01 AM
  5-6 wide core, why no mention from Intel?Wouter Tinus2015/09/30 02:14 PM
    5-6 wide core, why no mention from Intel?Maynard Handley2015/09/30 03:30 PM
      5-6 wide core, why no mention from Intel?Alberto2015/10/01 12:13 AM
        5-6 wide core, why no mention from Intel?anon2015/10/01 02:21 AM
          5-6 wide core, why no mention from Intel?Alberto2015/10/01 04:41 AM
            5-6 wide core, why no mention from Intel?anon2015/10/01 05:27 AM
              5-6 wide core, why no mention from Intel?Alberto2015/10/01 08:33 AM
                5-6 wide core, why no mention from Intel?juanrga2015/10/01 10:24 AM
        5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 08:57 AM
    5-6 wide core, why no mention from Intel?juanrga2015/10/01 03:59 AM
      5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/01 02:48 PM
        5-6 wide core, why no mention from Intel?juanrga2015/10/03 03:17 AM
          5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/03 11:19 AM
            Are you kidding? (NT)juanrga2015/10/04 05:30 AM
              Are you kidding?Wouter Tinus2015/10/04 03:18 PM
                Are you kidding?juanrga2015/10/05 09:46 AM
                  Are you kidding?David Kanter2015/10/05 11:24 AM
                    Are you kidding?anon2015/10/05 09:26 PM
                    Are you kidding?Linus Torvalds2015/10/07 04:49 AM
                      Are you kidding?juanrga2015/10/07 10:46 AM
                        Are you kidding?anon2015/10/07 06:21 PM
                  Are you kidding?Wouter Tinus2015/10/05 01:25 PM
                    Are you kidding?juanrga2015/10/06 10:17 AM
                      Are you kidding?Stubabe2015/10/07 12:17 AM
                        Are you kidding?juanrga2015/10/07 10:56 AM
                          Amazing...Wouter Tinus2015/10/07 11:31 AM
                            Amazing...juanrga2015/10/07 03:45 PM
                          Are you kidding?Stubabe2015/10/07 11:57 AM
                            Are you kidding?juanrga2015/10/07 03:59 PM
                          Are you kidding?Wilco2015/10/07 02:07 PM
                            Are you kidding?juanrga2015/10/07 04:33 PM
      5-6 wide core, why no mention from Intel?Eric Bron2015/10/04 04:18 AM
    5-6 wide core, why no mention from Intel?David Kanter2015/10/01 09:01 AM
      Optimal number and kind of execution unitsjuanrga2015/10/01 10:50 AM
        Optimal number and kind of execution unitsPatrick Chase2015/10/01 04:38 PM
          Optimal number and kind of execution unitsI.S.T.2015/10/01 05:10 PM
            Optimal number and kind of execution unitsPatrick Chase2015/10/01 11:39 PM
          Optimal number and kind of execution unitsExophase2015/10/01 10:11 PM
          Optimal number and kind of execution unitsjuanrga2015/10/02 05:14 AM
      LD/ST unitsSHK2015/10/01 11:11 AM
        LD/ST unitsDavid Kanter2015/10/01 12:54 PM
          LD/ST unitsSHK2015/10/02 04:55 AM
            LD/ST unitsJukka Larja2015/10/02 09:49 PM
        LD/ST unitsMaynard Handley2015/10/01 01:01 PM
          LD/ST unitsanon2015/10/01 09:54 PM
      5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 12:57 PM
        5-6 wide core, why no mention from Intel?David Kanter2015/10/01 03:49 PM
          5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 06:21 PM
          5-6 wide core, why no mention from Intel?Exophase2015/10/01 10:07 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 12:10 AM
              5-6 wide core, why no mention from Intel?Megol2015/10/02 03:39 AM
                5-6 wide core, why no mention from Intel?Michael S2015/10/02 04:27 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:37 AM
                  5-6 wide core, why no mention from Intel?noko2015/10/02 05:19 PM
              5-6 wide core, why no mention from Intel?Exophase2015/10/02 06:43 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:45 AM
                  5-6 wide core, why no mention from Intel?Exophase2015/10/02 10:23 AM
          5-6 wide core, why no mention from Intel?Wilco2015/10/02 12:48 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 01:25 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:26 PM
              5-6 wide core, why no mention from Intel?noko2015/10/02 05:45 PM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 06:54 PM
            5-6 wide core, why no mention from Intel?David Kanter2015/10/02 01:59 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:59 PM
                5-6 wide core, why no mention from Intel?David Kanter2015/10/02 03:15 PM
                  5-6 wide core, why no mention from Intel?Wilco2015/10/02 04:06 PM
                    LDP/STP usage in AArch64 for 403.gccnone2015/10/03 01:04 AM
                      LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:02 AM
                        LDP/STP usage in AArch64 for 403.gccnone2015/10/03 03:11 AM
                          LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:37 AM
                            LDP/STP usage in AArch64 for 403.gccnone2015/10/03 04:37 AM
                              LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 05:26 AM
                  5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 04:24 PM
              5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 03:07 PM
  Update to Intel Optimization Manualanon2015/09/30 04:43 PM
  Update to Intel Optimization ManualPatrick Chase2015/09/30 09:44 PM
    Update to Intel Optimization Manualanon2015/09/30 10:49 PM
    Update to Intel Optimization Manualnone2015/09/30 10:50 PM
    Update to Intel Optimization ManualDavid Kanter2015/10/01 12:52 PM
      Update to Intel Optimization ManualPatrick Chase2015/10/01 04:16 PM
        Update to Intel Optimization Manualanon2015/10/01 10:45 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?