Move elimination can be a µop fusion

Article: Intel's Haswell CPU Microarchitecture
By: anon (anon.delete@this.anon.com), November 15, 2012 1:23 am
Room: Moderated Discussions
Felid (Felid.delete@this.mailinator.com) on November 15, 2012 12:49 am wrote:
> > Bulldozer actually eliminates MOVs (for SIMD only) using the register renaming technique as
> > you described. But in Ivy Bridge, as long as I've measured in the actual processor, it shows
> > a behavior that it fuses a MOV instruction with a subsequent dependent instruction for MOV
> > elimination (when there is no MOV-dependent instruction, MOV is not eliminated at all).
> >
> > Fusion seems to be done in uop domain because non-adjacent instructions can be fused.
>
> It doesn't makes sense. There can be many reads of mov's destination, so every one on these
> mops should get their source register replaced with (link to) original. This can't be done
> with fusion (2 instructions —> 1 mop), but perfectly apply to renaming logic.

I don't mean it is a macro-fusion.

For example,

loop:
movaps xmm1, xmm0
movaps xmm0, xmm1
dec ecx
jnz loop

This loop takes 3clk/loop in Sandy Bridge, 2clk/loop in Ivy Bridge. If MOV elimination were totally done by renaming logic this loop should take only 1 cycle (only dec+jnz is issued to port 5) in Ivy. But actually it takes 2 cycles and this means at least one movaps is issued to port 5 per loop.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Haswell CPU article onlineDavid Kanter11/13/12 03:43 PM
  Haswell CPU article onlineEric11/13/12 04:10 PM
    Haswell CPU article onlinehobold11/13/12 05:13 PM
      Haswell CPU article onlineRicardo B11/13/12 06:09 PM
    Haswell CPU article onlineanonymou511/13/12 05:44 PM
      Haswell CPU article onlinenone11/14/12 03:40 AM
  Haswell CPU article onlinetarlinian11/13/12 04:56 PM
    Fixed (NT)David Kanter11/13/12 06:06 PM
      Haswell CPU article onlineJacob Marley11/14/12 02:18 AM
  Haswell CPU article onlinerandomshinichi11/14/12 02:53 AM
    LLC == Last Level Cache (usually L3) (NT)Paul A. Clayton11/14/12 05:50 AM
    Haswell CPU article onlineJoe11/14/12 10:38 AM
      LLC vs. L3 vs. L4David Kanter11/14/12 11:09 AM
        LLC vs. L3 vs. L4; LLC = Link Layer ControllerRay11/14/12 10:08 PM
          A pit there are only 17000 TLAs... (NT)EduardoS11/15/12 03:14 AM
  Haswell CPU article onlineanon11/14/12 05:10 AM
    Move elimination can be a µop fusionPaul A. Clayton11/14/12 06:41 AM
      That should be "mov R10 <- R9"! (NT)Paul A. Clayton11/14/12 06:43 AM
      Move elimination can be a µop fusionanon11/14/12 07:25 AM
        It does avoid the scheduler (NT)Paul A. Clayton11/14/12 08:47 AM
      Move elimination can be a µop fusionStubabe11/14/12 01:43 PM
        Move elimination can be a µop fusionanon11/14/12 09:33 PM
          Move elimination can be a µop fusionFelid11/15/12 12:49 AM
            Move elimination can be a µop fusionanon11/15/12 01:23 AM
              Move elimination can be a µop fusionStuart11/15/12 05:04 AM
                Move elimination can be a µop fusionStubabe11/15/12 05:14 AM
                  Move elimination can be a µop fusionanon11/15/12 05:48 AM
                    Move elimination can be a µop fusionEduardoS11/15/12 06:00 AM
                      Move elimination can be a µop fusionanon11/15/12 06:14 AM
                        Move elimination can be a µop fusionEduardoS11/15/12 06:21 AM
                          Move elimination can be a µop fusionanon11/15/12 06:31 AM
                    Move elimination can be a µop fusionStubabe11/15/12 11:38 AM
                      There can be only one dependencePaul A. Clayton11/15/12 12:50 PM
                    Move elimination can be a µop fusionFelid11/15/12 03:19 PM
                      Move elimination can be a µop fusionanon11/16/12 04:07 AM
                        Move elimination can be a µop fusionFelid11/16/12 07:43 PM
                  Move elimination can be a µop fusionFelid11/15/12 02:50 PM
                    Move elimination can be a µop fusionFelid11/15/12 03:03 PM
                      Correction!Felid11/19/12 01:23 AM
                    Thanks, I wasn't aware of the change in SB. Good to know... (NT)Stubabe11/15/12 03:43 PM
            Move fusion assumes adjacencyPaul A. Clayton11/15/12 07:15 AM
              Move fusion assumes adjacencyFelid11/15/12 02:40 PM
        Move elimination can be a µop fusionPatrick Chase11/21/12 11:52 AM
          Move elimination can be a µop fusionPatrick Chase11/21/12 12:12 PM
    Haswell CPU article onlineRicardo B11/14/12 09:12 AM
  Haswell CPU article onlinegmb11/14/12 08:28 AM
  Haswell CPU article onlineFelid11/14/12 11:58 PM
    Haswell CPU article onlineDavid Kanter11/15/12 09:59 AM
      Haswell CPU article onlineFelid11/15/12 02:15 PM
        Instruction queueDavid Kanter11/16/12 12:23 PM
          Instruction queueFelid11/16/12 01:05 PM
  128-bit division unit?Eric Bron11/16/12 04:57 AM
    128-bit division unit?David Kanter11/16/12 08:59 AM
      128-bit division unit?Eric Bron11/16/12 09:47 AM
        128-bit division unit?Felid11/16/12 12:46 PM
          128-bit division unit?Eric Bron11/16/12 01:24 PM
            128-bit division unit?Felid11/16/12 07:19 PM
              128-bit division unit?Eric Bron11/18/12 08:41 AM
            128-bit division unit?Michael S11/17/12 12:50 PM
              128-bit division unit?Felid11/17/12 01:44 PM
                128-bit division unit?Michael S11/17/12 02:45 PM
                  128-bit division unit?Felid11/17/12 05:49 PM
                    128-bit division unit?Michael S11/17/12 06:56 PM
              128-bit division unit?Eric Bron11/18/12 08:35 AM
  Haswell CPU article onlineJim F11/18/12 09:45 AM
    Haswell CPU article onlineGabriele Svelto11/18/12 12:52 PM
  Probable bottleneckLaurent Birtz11/23/12 01:45 PM
    Probable bottleneckEduardoS11/23/12 01:58 PM
      Probable bottleneckLaurent Birtz11/24/12 10:10 AM
    Probable bottleneckStubabe11/25/12 03:08 AM
      Probable bottleneckEduardoS11/25/12 08:15 AM
        Probable bottleneckStubabe11/28/12 04:36 PM
          Urgh. Post got mangled by LESS THAN signStubabe11/28/12 04:41 PM
          Probable bottleneckLaurent Birtz11/29/12 08:34 AM
  Haswell CPU article onlineMr. Camel11/28/12 03:47 PM
    Haswell CPU article onlineEduardoS11/28/12 04:06 PM
      Haswell CPU article onlineMr. Camel11/28/12 07:23 PM
        Haswell CPU article onlineEduardoS11/28/12 07:27 PM
          Haswell CPU article onlineMr. Camel12/12/12 01:39 PM
            Much faster iGPU clock ...Mark Roulo12/12/12 03:53 PM
              Much faster iGPU clock ...Exophase12/12/12 11:46 PM
                Much faster iGPU clock ... or not :-)Mark Roulo12/13/12 09:11 AM
                  Much faster iGPU clock ... or not :-)EduardoS12/13/12 10:38 PM
                    Much faster iGPU clock ... or not :-)Michael S12/14/12 05:33 AM
                      Much faster iGPU clock ... or not :-)EduardoS12/14/12 07:06 AM
                        Much faster iGPU clock ... or not :-)Doug S12/14/12 12:13 PM
                          Much faster iGPU clock ... or not :-)EduardoS12/14/12 12:43 PM
                  Much faster iGPU clock ... or not :-)Mr. Camel12/14/12 10:50 AM
              Much faster iGPU clock ...Michael S12/13/12 02:44 AM
                Much faster iGPU clock ...Mark Roulo12/13/12 09:09 AM
  Haswell CPU article onlineYang12/09/12 08:28 PM
    possible spam bot? (NT)I.S.T.12/10/12 03:40 PM
  CPU Crystal Well behavior w/ eGPU?Robert Williams04/17/13 02:16 PM
    CPU Crystal Well behavior w/ eGPU?Nicolas Capens04/17/13 03:30 PM
      CPU Crystal Well behavior w/ eGPU?RecessionCone04/17/13 04:20 PM
        CPU Crystal Well behavior w/ eGPU?Robert Williams04/17/13 07:37 PM
    CPU Crystal Well behavior w/ eGPU?Eric Bron04/17/13 09:10 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?