Move elimination can be a µop fusion

Article: Intel's Haswell CPU Microarchitecture
By: Stubabe (nospam.delete@this.nospam.com), November 15, 2012 11:38 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on November 15, 2012 5:48 am wrote:
> Stubabe (nospam.delete@this.nospam.com) on November 15, 2012 5:14 am wrote:
> >
> > > But on paper that loop should only take 2 clk/loop on sandy due to co-issue of the fused branch,
> > > it takes 3 on Sandy (and 2 on Ivy) due to the loop buffer penalty of 1 clk/iteration
> >
> > Sorry I was thinking of MOVDQA it should be 3 on Sandy. But the loop buffer can issue max 4 uops/clk to the
> > renamer + 1 penalty clock so minimum loop time is 2 clocks irrespective of what the backend does with it.
> >
> >
>
> MOVAPS can only be issued to port 5 (maximum throughput is 1 / clk). Fused branch
> (dec+jnz) is also issued to port 5. What is the penalty you are talking about?
>
I saw somewhere (here?) that on Nahelam there was a 1 cycle penalty for each branch hit in the uop loop buffer. I guess that is gone now in Sandy.

> Another example:
>
> loop:
> xorps xmm0, xmm0
> xorps xmm0, xmm0
> xorps xmm0, xmm0
> dec ecx
> jnz loop
>
> This loop takes only 1 clk on both Sandy and Ivy. XORPS is also port5-only instruction, but
> due to the "zeroing idioms" feature, which definitely uses the renaming technique, they are
> executed without using the backend. If the MOV elimination in Ivy were done by the renaming
> technique, the MOVAPS example should take only 1 clk per loop like this XORPS example.

Or could it be a bookkeeping uop being issued due to the branch? After all a branch misprediction will need to recover state somehow. I'm sorry, but the uop fusion thing looks messy for the hardware to track to me, especially if multiple dependencies are present.

Perhaps the way to tell (sorry don't have an Ivy to test on) would be a longer chain (e.g. six MOVAPS) if it stays at 2 clocks we know its some kind of branch induced sync op being issued if it goes up above 2 then they really are using some uop fusion thing
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Haswell CPU article onlineDavid Kanter11/13/12 03:43 PM
  Haswell CPU article onlineEric11/13/12 04:10 PM
    Haswell CPU article onlinehobold11/13/12 05:13 PM
      Haswell CPU article onlineRicardo B11/13/12 06:09 PM
    Haswell CPU article onlineanonymou511/13/12 05:44 PM
      Haswell CPU article onlinenone11/14/12 03:40 AM
  Haswell CPU article onlinetarlinian11/13/12 04:56 PM
    Fixed (NT)David Kanter11/13/12 06:06 PM
      Haswell CPU article onlineJacob Marley11/14/12 02:18 AM
  Haswell CPU article onlinerandomshinichi11/14/12 02:53 AM
    LLC == Last Level Cache (usually L3) (NT)Paul A. Clayton11/14/12 05:50 AM
    Haswell CPU article onlineJoe11/14/12 10:38 AM
      LLC vs. L3 vs. L4David Kanter11/14/12 11:09 AM
        LLC vs. L3 vs. L4; LLC = Link Layer ControllerRay11/14/12 10:08 PM
          A pit there are only 17000 TLAs... (NT)EduardoS11/15/12 03:14 AM
  Haswell CPU article onlineanon11/14/12 05:10 AM
    Move elimination can be a µop fusionPaul A. Clayton11/14/12 06:41 AM
      That should be "mov R10 <- R9"! (NT)Paul A. Clayton11/14/12 06:43 AM
      Move elimination can be a µop fusionanon11/14/12 07:25 AM
        It does avoid the scheduler (NT)Paul A. Clayton11/14/12 08:47 AM
      Move elimination can be a µop fusionStubabe11/14/12 01:43 PM
        Move elimination can be a µop fusionanon11/14/12 09:33 PM
          Move elimination can be a µop fusionFelid11/15/12 12:49 AM
            Move elimination can be a µop fusionanon11/15/12 01:23 AM
              Move elimination can be a µop fusionStuart11/15/12 05:04 AM
                Move elimination can be a µop fusionStubabe11/15/12 05:14 AM
                  Move elimination can be a µop fusionanon11/15/12 05:48 AM
                    Move elimination can be a µop fusionEduardoS11/15/12 06:00 AM
                      Move elimination can be a µop fusionanon11/15/12 06:14 AM
                        Move elimination can be a µop fusionEduardoS11/15/12 06:21 AM
                          Move elimination can be a µop fusionanon11/15/12 06:31 AM
                    Move elimination can be a µop fusionStubabe11/15/12 11:38 AM
                      There can be only one dependencePaul A. Clayton11/15/12 12:50 PM
                    Move elimination can be a µop fusionFelid11/15/12 03:19 PM
                      Move elimination can be a µop fusionanon11/16/12 04:07 AM
                        Move elimination can be a µop fusionFelid11/16/12 07:43 PM
                  Move elimination can be a µop fusionFelid11/15/12 02:50 PM
                    Move elimination can be a µop fusionFelid11/15/12 03:03 PM
                      Correction!Felid11/19/12 01:23 AM
                    Thanks, I wasn't aware of the change in SB. Good to know... (NT)Stubabe11/15/12 03:43 PM
            Move fusion assumes adjacencyPaul A. Clayton11/15/12 07:15 AM
              Move fusion assumes adjacencyFelid11/15/12 02:40 PM
        Move elimination can be a µop fusionPatrick Chase11/21/12 11:52 AM
          Move elimination can be a µop fusionPatrick Chase11/21/12 12:12 PM
    Haswell CPU article onlineRicardo B11/14/12 09:12 AM
  Haswell CPU article onlinegmb11/14/12 08:28 AM
  Haswell CPU article onlineFelid11/14/12 11:58 PM
    Haswell CPU article onlineDavid Kanter11/15/12 09:59 AM
      Haswell CPU article onlineFelid11/15/12 02:15 PM
        Instruction queueDavid Kanter11/16/12 12:23 PM
          Instruction queueFelid11/16/12 01:05 PM
  128-bit division unit?Eric Bron11/16/12 04:57 AM
    128-bit division unit?David Kanter11/16/12 08:59 AM
      128-bit division unit?Eric Bron11/16/12 09:47 AM
        128-bit division unit?Felid11/16/12 12:46 PM
          128-bit division unit?Eric Bron11/16/12 01:24 PM
            128-bit division unit?Felid11/16/12 07:19 PM
              128-bit division unit?Eric Bron11/18/12 08:41 AM
            128-bit division unit?Michael S11/17/12 12:50 PM
              128-bit division unit?Felid11/17/12 01:44 PM
                128-bit division unit?Michael S11/17/12 02:45 PM
                  128-bit division unit?Felid11/17/12 05:49 PM
                    128-bit division unit?Michael S11/17/12 06:56 PM
              128-bit division unit?Eric Bron11/18/12 08:35 AM
  Haswell CPU article onlineJim F11/18/12 09:45 AM
    Haswell CPU article onlineGabriele Svelto11/18/12 12:52 PM
  Probable bottleneckLaurent Birtz11/23/12 01:45 PM
    Probable bottleneckEduardoS11/23/12 01:58 PM
      Probable bottleneckLaurent Birtz11/24/12 10:10 AM
    Probable bottleneckStubabe11/25/12 03:08 AM
      Probable bottleneckEduardoS11/25/12 08:15 AM
        Probable bottleneckStubabe11/28/12 04:36 PM
          Urgh. Post got mangled by LESS THAN signStubabe11/28/12 04:41 PM
          Probable bottleneckLaurent Birtz11/29/12 08:34 AM
  Haswell CPU article onlineMr. Camel11/28/12 03:47 PM
    Haswell CPU article onlineEduardoS11/28/12 04:06 PM
      Haswell CPU article onlineMr. Camel11/28/12 07:23 PM
        Haswell CPU article onlineEduardoS11/28/12 07:27 PM
          Haswell CPU article onlineMr. Camel12/12/12 01:39 PM
            Much faster iGPU clock ...Mark Roulo12/12/12 03:53 PM
              Much faster iGPU clock ...Exophase12/12/12 11:46 PM
                Much faster iGPU clock ... or not :-)Mark Roulo12/13/12 09:11 AM
                  Much faster iGPU clock ... or not :-)EduardoS12/13/12 10:38 PM
                    Much faster iGPU clock ... or not :-)Michael S12/14/12 05:33 AM
                      Much faster iGPU clock ... or not :-)EduardoS12/14/12 07:06 AM
                        Much faster iGPU clock ... or not :-)Doug S12/14/12 12:13 PM
                          Much faster iGPU clock ... or not :-)EduardoS12/14/12 12:43 PM
                  Much faster iGPU clock ... or not :-)Mr. Camel12/14/12 10:50 AM
              Much faster iGPU clock ...Michael S12/13/12 02:44 AM
                Much faster iGPU clock ...Mark Roulo12/13/12 09:09 AM
  Haswell CPU article onlineYang12/09/12 08:28 PM
    possible spam bot? (NT)I.S.T.12/10/12 03:40 PM
  CPU Crystal Well behavior w/ eGPU?Robert Williams04/17/13 02:16 PM
    CPU Crystal Well behavior w/ eGPU?Nicolas Capens04/17/13 03:30 PM
      CPU Crystal Well behavior w/ eGPU?RecessionCone04/17/13 04:20 PM
        CPU Crystal Well behavior w/ eGPU?Robert Williams04/17/13 07:37 PM
    CPU Crystal Well behavior w/ eGPU?Eric Bron04/17/13 09:10 PM
  Haswell CPU article onlineSireesh09/01/14 02:48 PM
    Haswell CPU article onlineMaynard Handley09/01/14 03:51 PM
      Great postDavid Kanter09/01/14 07:12 PM
      Thanks :)Alberto09/02/14 01:42 AM
      Thanks (NT)Poindexter09/02/14 09:31 AM
    Haswell CPU article onlineEduardoS09/01/14 04:21 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?