Haswell CPU article online

Article: Intel's Haswell CPU Microarchitecture
By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), November 13, 2012 6:09 pm
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on November 13, 2012 5:13 pm wrote:
>
> I would guess that in the context of a fully cache coherent manycore machine, when you
> try to optimize the scatter operation to store more than a single vector element at a
> time, you run into problems. Memory transactions probably don't make that any easier.
>
> For example, if a scatter operation has to be aborted and restarted for some reason, does it (semantically)
> execute all or nothing? Or can it be in a partly completed state? Larrabee made partly completed
> state information architecturally visible (in a mandatory boolean mask register), but did not support
> transactional memory. As far as I know, AVX* does not expose such internal state.
>
> Or when two "simultaneous" scatter operations from two different cores fight over overlapping
> memory addresses, and then one or both operations has to be undone and later rerun ... is it even
> possible to decide on a consistent specification of what the memory contents ought to be?
>
> Does the coherency protocol support groups of in-flight memory accesses that are semantically
> related to one another? With respect to one or more memory transactions?
>
>
> I could be blowing this issue out of proportions due to personal
> cluelessness. But it does seem rather complicated to me.

AVX2 does expose internal state.
«This instruction can be suspended by an exception if at least one element is already
gathered (i.e., if the exception is triggered by an element other than the rightmost
one with its mask bit set). When this happens, the destination register and the mask
operand are partially updated; those elements that have been gathered are placed
into the destination register and have their mask bits set to zero. If any traps or
interrupts are pending from already gathered elements, they will be delivered in lieu
of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is
not re-triggered when the instruction is continued.»

This, of course, makes it simple to implement gather: it's actually a µOP sequence, with individual µOP for each load. All it's required is to keep the proper order.

Using a similar method for stores should not be complicated either.

I'd say Intel simply decided scatter wasn't worth the effort.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Haswell CPU article onlineDavid Kanter2012/11/13 03:43 PM
  Haswell CPU article onlineEric2012/11/13 04:10 PM
    Haswell CPU article onlinehobold2012/11/13 05:13 PM
      Haswell CPU article onlineRicardo B2012/11/13 06:09 PM
    Haswell CPU article onlineanonymou52012/11/13 05:44 PM
      Haswell CPU article onlinenone2012/11/14 03:40 AM
  Haswell CPU article onlinetarlinian2012/11/13 04:56 PM
    Fixed (NT)David Kanter2012/11/13 06:06 PM
      Haswell CPU article onlineJacob Marley2012/11/14 02:18 AM
  Haswell CPU article onlinerandomshinichi2012/11/14 02:53 AM
    LLC == Last Level Cache (usually L3) (NT)Paul A. Clayton2012/11/14 05:50 AM
    Haswell CPU article onlineJoe2012/11/14 10:38 AM
      LLC vs. L3 vs. L4David Kanter2012/11/14 11:09 AM
        LLC vs. L3 vs. L4; LLC = Link Layer ControllerRay2012/11/14 10:08 PM
          A pit there are only 17000 TLAs... (NT)EduardoS2012/11/15 03:14 AM
  Haswell CPU article onlineanon2012/11/14 05:10 AM
    Move elimination can be a µop fusionPaul A. Clayton2012/11/14 06:41 AM
      That should be "mov R10 <- R9"! (NT)Paul A. Clayton2012/11/14 06:43 AM
      Move elimination can be a µop fusionanon2012/11/14 07:25 AM
        It does avoid the scheduler (NT)Paul A. Clayton2012/11/14 08:47 AM
      Move elimination can be a µop fusionStubabe2012/11/14 01:43 PM
        Move elimination can be a µop fusionanon2012/11/14 09:33 PM
          Move elimination can be a µop fusionFelid2012/11/15 12:49 AM
            Move elimination can be a µop fusionanon2012/11/15 01:23 AM
              Move elimination can be a µop fusionStuart2012/11/15 05:04 AM
                Move elimination can be a µop fusionStubabe2012/11/15 05:14 AM
                  Move elimination can be a µop fusionanon2012/11/15 05:48 AM
                    Move elimination can be a µop fusionEduardoS2012/11/15 06:00 AM
                      Move elimination can be a µop fusionanon2012/11/15 06:14 AM
                        Move elimination can be a µop fusionEduardoS2012/11/15 06:21 AM
                          Move elimination can be a µop fusionanon2012/11/15 06:31 AM
                    Move elimination can be a µop fusionStubabe2012/11/15 11:38 AM
                      There can be only one dependencePaul A. Clayton2012/11/15 12:50 PM
                    Move elimination can be a µop fusionFelid2012/11/15 03:19 PM
                      Move elimination can be a µop fusionanon2012/11/16 04:07 AM
                        Move elimination can be a µop fusionFelid2012/11/16 07:43 PM
                  Move elimination can be a µop fusionFelid2012/11/15 02:50 PM
                    Move elimination can be a µop fusionFelid2012/11/15 03:03 PM
                      Correction!Felid2012/11/19 01:23 AM
                    Thanks, I wasn't aware of the change in SB. Good to know... (NT)Stubabe2012/11/15 03:43 PM
            Move fusion assumes adjacencyPaul A. Clayton2012/11/15 07:15 AM
              Move fusion assumes adjacencyFelid2012/11/15 02:40 PM
        Move elimination can be a µop fusionPatrick Chase2012/11/21 11:52 AM
          Move elimination can be a µop fusionPatrick Chase2012/11/21 12:12 PM
    Haswell CPU article onlineRicardo B2012/11/14 09:12 AM
  Haswell CPU article onlinegmb2012/11/14 08:28 AM
  Haswell CPU article onlineFelid2012/11/14 11:58 PM
    Haswell CPU article onlineDavid Kanter2012/11/15 09:59 AM
      Haswell CPU article onlineFelid2012/11/15 02:15 PM
        Instruction queueDavid Kanter2012/11/16 12:23 PM
          Instruction queueFelid2012/11/16 01:05 PM
  128-bit division unit?Eric Bron2012/11/16 04:57 AM
    128-bit division unit?David Kanter2012/11/16 08:59 AM
      128-bit division unit?Eric Bron2012/11/16 09:47 AM
        128-bit division unit?Felid2012/11/16 12:46 PM
          128-bit division unit?Eric Bron2012/11/16 01:24 PM
            128-bit division unit?Felid2012/11/16 07:19 PM
              128-bit division unit?Eric Bron2012/11/18 08:41 AM
            128-bit division unit?Michael S2012/11/17 12:50 PM
              128-bit division unit?Felid2012/11/17 01:44 PM
                128-bit division unit?Michael S2012/11/17 02:45 PM
                  128-bit division unit?Felid2012/11/17 05:49 PM
                    128-bit division unit?Michael S2012/11/17 06:56 PM
              128-bit division unit?Eric Bron2012/11/18 08:35 AM
  Haswell CPU article onlineJim F2012/11/18 09:45 AM
    Haswell CPU article onlineGabriele Svelto2012/11/18 12:52 PM
  Probable bottleneckLaurent Birtz2012/11/23 01:45 PM
    Probable bottleneckEduardoS2012/11/23 01:58 PM
      Probable bottleneckLaurent Birtz2012/11/24 10:10 AM
    Probable bottleneckStubabe2012/11/25 03:08 AM
      Probable bottleneckEduardoS2012/11/25 08:15 AM
        Probable bottleneckStubabe2012/11/28 04:36 PM
          Urgh. Post got mangled by LESS THAN signStubabe2012/11/28 04:41 PM
          Probable bottleneckLaurent Birtz2012/11/29 08:34 AM
  Haswell CPU article onlineMr. Camel2012/11/28 03:47 PM
    Haswell CPU article onlineEduardoS2012/11/28 04:06 PM
      Haswell CPU article onlineMr. Camel2012/11/28 07:23 PM
        Haswell CPU article onlineEduardoS2012/11/28 07:27 PM
          Haswell CPU article onlineMr. Camel2012/12/12 01:39 PM
            Much faster iGPU clock ...Mark Roulo2012/12/12 03:53 PM
              Much faster iGPU clock ...Exophase2012/12/12 11:46 PM
                Much faster iGPU clock ... or not :-)Mark Roulo2012/12/13 09:11 AM
                  Much faster iGPU clock ... or not :-)EduardoS2012/12/13 10:38 PM
                    Much faster iGPU clock ... or not :-)Michael S2012/12/14 05:33 AM
                      Much faster iGPU clock ... or not :-)EduardoS2012/12/14 07:06 AM
                        Much faster iGPU clock ... or not :-)Doug S2012/12/14 12:13 PM
                          Much faster iGPU clock ... or not :-)EduardoS2012/12/14 12:43 PM
                  Much faster iGPU clock ... or not :-)Mr. Camel2012/12/14 10:50 AM
              Much faster iGPU clock ...Michael S2012/12/13 02:44 AM
                Much faster iGPU clock ...Mark Roulo2012/12/13 09:09 AM
  Haswell CPU article onlineYang2012/12/09 08:28 PM
    possible spam bot? (NT)I.S.T.2012/12/10 03:40 PM
  CPU Crystal Well behavior w/ eGPU?Robert Williams2013/04/17 02:16 PM
    CPU Crystal Well behavior w/ eGPU?Nicolas Capens2013/04/17 03:30 PM
      CPU Crystal Well behavior w/ eGPU?RecessionCone2013/04/17 04:20 PM
        CPU Crystal Well behavior w/ eGPU?Robert Williams2013/04/17 07:37 PM
    CPU Crystal Well behavior w/ eGPU?Eric Bron2013/04/17 09:10 PM
  Haswell CPU article onlineSireesh2014/09/01 02:48 PM
    Haswell CPU article onlineMaynard Handley2014/09/01 03:51 PM
      Great postDavid Kanter2014/09/01 07:12 PM
      Thanks :)Alberto2014/09/02 01:42 AM
      Thanks (NT)Poindexter2014/09/02 09:31 AM
    Haswell CPU article onlineEduardoS2014/09/01 04:21 PM
  Haswell CPU article onlineAlbert2015/10/06 01:48 AM
    Haswell CPU article onlineMichael S2015/10/06 02:10 AM
    Haswell CPU article onlineSHK2015/10/06 03:51 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?