another try

Article: Knights Landing Details
By: Michael S (already5chosen.delete@this.yahoo.com), January 9, 2014 3:20 pm
Room: Moderated Discussions
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on January 9, 2014 3:33 pm wrote:
> 1st line changed to:
>

> void Demo (const double * srcX, const double * srcY, double * __restrict dst, int size)
>
>
>
> AVX-512:
>
> .B1.4:: ; Preds .B1.4 .B1.3
> vmovups zmm3, ZMMWORD PTR [r14+rdx*8] ;173.16
> vmovups zmm5, ZMMWORD PTR [64+r14+rdx*8] ;173.16
> vmovups zmm17, ZMMWORD PTR [128+r14+rdx*8] ;173.16
> vmovups zmm19, ZMMWORD PTR [192+r14+rdx*8] ;173.16
> vsubpd zmm4, zmm3, ZMMWORD PTR [r12+rdx*8] ;177.20
> vsubpd zmm16, zmm5, ZMMWORD PTR [64+r12+rdx*8] ;177.20
> vsubpd zmm18, zmm17, ZMMWORD PTR [128+r12+rdx*8] ;177.20
> vsubpd zmm20, zmm19, ZMMWORD PTR [192+r12+rdx*8] ;177.20
> vfmadd132pd zmm4, zmm0, ZMMWORD PTR [r14+rdx*8] ;177.25
> vfmadd132pd zmm16, zmm0, ZMMWORD PTR [64+r14+rdx*8] ;177.25
> vfmadd132pd zmm18, zmm0, ZMMWORD PTR [128+r14+rdx*8] ;177.25
> vfmadd132pd zmm20, zmm0, ZMMWORD PTR [192+r14+rdx*8] ;177.25
> vfmadd132pd zmm4, zmm1, ZMMWORD PTR [r14+rdx*8] ;177.32
> vfmadd132pd zmm16, zmm1, ZMMWORD PTR [64+r14+rdx*8] ;177.32
> vfmadd132pd zmm18, zmm1, ZMMWORD PTR [128+r14+rdx*8] ;177.32
> vfmadd132pd zmm20, zmm1, ZMMWORD PTR [192+r14+rdx*8] ;177.32
> vfmadd132pd zmm4, zmm2, ZMMWORD PTR [r14+rdx*8] ;177.38
> vmovups ZMMWORD PTR [r11+rdx*8], zmm4 ;177.5
> vfmadd132pd zmm16, zmm2, ZMMWORD PTR [64+r14+rdx*8] ;177.38
> vmovups ZMMWORD PTR [64+r11+rdx*8], zmm16 ;177.5
> vfmadd132pd zmm18, zmm2, ZMMWORD PTR [128+r14+rdx*8] ;177.38
> vmovups ZMMWORD PTR [128+r11+rdx*8], zmm18 ;177.5
> vfmadd132pd zmm20, zmm2, ZMMWORD PTR [192+r14+rdx*8] ;177.38
> vmovups ZMMWORD PTR [192+r11+rdx*8], zmm20 ;177.5
> add rdx, 32 ;171.3
> cmp rdx, rax ;171.3
> jb .B1.4 ; Prob 82% ;171.3
>

O.k. Now it looks like a decisive proof that icc really likes memory operand better than register operand.

>
> AVX2:
>
> .B1.4:: ; Preds .B1.4 .B1.3
> vmovupd ymm5, YMMWORD PTR [rcx+rax*8] ;173.16
> vsubpd ymm3, ymm5, YMMWORD PTR [rdx+rax*8] ;177.20
> vfmadd213pd ymm3, ymm5, ymm0 ;177.25
> vmovapd ymm4, ymm1 ;177.32
> vfmadd231pd ymm4, ymm3, ymm5 ;177.32
> vfmadd213pd ymm5, ymm4, ymm2 ;177.38
> vmovupd YMMWORD PTR [r8+rax*8], ymm5 ;177.5


I don't understand this code generation. Move elimination or not, the variant below can never be slower and sometimes (e.g. for short loop running not from L1I) it will be faster.

vmovupd ymm5, YMMWORD PTR [rcx+rax*8]
vsubpd ymm3, ymm5, YMMWORD PTR [rdx+rax*8]
vfmadd213pd ymm3, ymm5, ymm0
vfmadd213pd ymm3, ymm5, ymm1
vfmadd213pd ymm3, ymm5, ymm2
vmovupd YMMWORD PTR [r8+rax*8], ymm3

> vmovupd ymm5, YMMWORD PTR [32+rcx+rax*8] ;173.16
> vsubpd ymm3, ymm5, YMMWORD PTR [32+rdx+rax*8] ;177.20
> vfmadd213pd ymm3, ymm5, ymm0 ;177.25
> vmovapd ymm4, ymm1 ;177.32
> vfmadd231pd ymm4, ymm3, ymm5 ;177.32
> vfmadd213pd ymm5, ymm4, ymm2 ;177.38
> vmovupd YMMWORD PTR [32+r8+rax*8], ymm5 ;177.5
> vmovupd ymm5, YMMWORD PTR [64+rcx+rax*8] ;173.16
> vsubpd ymm3, ymm5, YMMWORD PTR [64+rdx+rax*8] ;177.20
> vfmadd213pd ymm3, ymm5, ymm0 ;177.25
> vmovapd ymm4, ymm1 ;177.32
> vfmadd231pd ymm4, ymm3, ymm5 ;177.32
> vfmadd213pd ymm5, ymm4, ymm2 ;177.38
> vmovupd YMMWORD PTR [64+r8+rax*8], ymm5 ;177.5
> vmovupd ymm5, YMMWORD PTR [96+rcx+rax*8] ;173.16
> vsubpd ymm3, ymm5, YMMWORD PTR [96+rdx+rax*8] ;177.20
> vfmadd213pd ymm3, ymm5, ymm0 ;177.25
> vmovapd ymm4, ymm1 ;177.32
> vfmadd231pd ymm4, ymm3, ymm5 ;177.32
> vfmadd213pd ymm5, ymm4, ymm2 ;177.38
> vmovupd YMMWORD PTR [96+r8+rax*8], ymm5 ;177.5
> add rax, 16 ;171.3
> cmp rax, r9 ;171.3
> jb .B1.4 ; Prob 82% ;171.3
>
>
> AVX:
>
> .B1.4:: ; Preds .B1.4 .B1.3
> vmovupd ymm3, YMMWORD PTR [rcx+rax*8] ;173.16
> vsubpd ymm4, ymm3, YMMWORD PTR [rdx+rax*8] ;177.20
> vmulpd ymm5, ymm4, ymm3 ;177.23
> vaddpd ymm4, ymm0, ymm5 ;177.25
> vmulpd ymm5, ymm3, ymm4 ;177.29
> vaddpd ymm4, ymm2, ymm5 ;177.32
> vmovupd ymm5, YMMWORD PTR [32+rcx+rax*8] ;173.16
> vmulpd ymm3, ymm3, ymm4 ;177.36
> vsubpd ymm4, ymm5, YMMWORD PTR [32+rdx+rax*8] ;177.20
> vaddpd ymm3, ymm1, ymm3 ;177.38
> vmovupd YMMWORD PTR [r8+rax*8], ymm3 ;177.5
> vmulpd ymm3, ymm4, ymm5 ;177.23
> vaddpd ymm4, ymm0, ymm3 ;177.25
> vmulpd ymm3, ymm5, ymm4 ;177.29
> vaddpd ymm4, ymm2, ymm3 ;177.32
> vmulpd ymm5, ymm5, ymm4 ;177.36
> vmovupd ymm4, YMMWORD PTR [64+rcx+rax*8] ;173.16
> vaddpd ymm3, ymm1, ymm5 ;177.38
> vsubpd ymm5, ymm4, YMMWORD PTR [64+rdx+rax*8] ;177.20
> vmovupd YMMWORD PTR [32+r8+rax*8], ymm3 ;177.5
> vmulpd ymm3, ymm5, ymm4 ;177.23
> vaddpd ymm5, ymm0, ymm3 ;177.25
> vmulpd ymm3, ymm4, ymm5 ;177.29
> vaddpd ymm5, ymm2, ymm3 ;177.32
> vmulpd ymm4, ymm4, ymm5 ;177.36
> vaddpd ymm3, ymm1, ymm4 ;177.38
> vmovupd YMMWORD PTR [64+r8+rax*8], ymm3 ;177.5
> vmovupd ymm3, YMMWORD PTR [96+rcx+rax*8] ;173.16
> vsubpd ymm4, ymm3, YMMWORD PTR [96+rdx+rax*8] ;177.20
> vmulpd ymm5, ymm4, ymm3 ;177.23
> vaddpd ymm4, ymm0, ymm5 ;177.25
> vmulpd ymm5, ymm3, ymm4 ;177.29
> vaddpd ymm4, ymm2, ymm5 ;177.32
> vmulpd ymm3, ymm3, ymm4 ;177.36
> vaddpd ymm3, ymm1, ymm3 ;177.38
> vmovupd YMMWORD PTR [96+r8+rax*8], ymm3 ;177.5
> add rax, 16 ;171.3
> cmp rax, r9 ;171.3
> jb .B1.4 ; Prob 82% ;171.3
>

AVX variant looks exactly like expected.

>
> SSE2:
>
> .B1.4:: ; Preds .B1.4 .B1.3
> movaps xmm4, XMMWORD PTR [rcx+rax*8] ;173.16
> movaps xmm3, xmm4 ;177.20
> subpd xmm3, XMMWORD PTR [rdx+rax*8] ;177.20
> mulpd xmm3, xmm4 ;177.23
> addpd xmm3, xmm0 ;177.25

Here it is possible to do better:

movaps xmm4, XMMWORD PTR [rcx+rax*8]
movaps xmm3, XMMWORD PTR [rdx+rax*8]
subpd xmm3, xmm4
mulpd xmm3, xmm4
subpd xmm0, xmm3

The number of x86 instructions is the same, but measured by uOps, my variant is shorter.
It probably does not matter on AMD K8/K10 or on 4-wide Intel cores, but my variant will almost certainly run faster 3-wide Intel cores, esp. Pentium4 and on AMD Bulldozer.


> mulpd xmm3, xmm4 ;177.29
> addpd xmm3, xmm1 ;177.32
> mulpd xmm4, xmm3 ;177.36
> movaps xmm3, XMMWORD PTR [16+rcx+rax*8] ;173.16
> movaps xmm5, xmm3 ;177.20
> subpd xmm5, XMMWORD PTR [16+rdx+rax*8] ;177.20
> addpd xmm4, xmm2 ;177.38
> mulpd xmm5, xmm3 ;177.23
> addpd xmm5, xmm0 ;177.25
> mulpd xmm5, xmm3 ;177.29
> addpd xmm5, xmm1 ;177.32
> mulpd xmm3, xmm5 ;177.36
> addpd xmm3, xmm2 ;177.38
> movaps XMMWORD PTR [16+r8+rax*8], xmm3 ;177.5
> movaps xmm3, XMMWORD PTR [32+rcx+rax*8] ;173.16
> movaps XMMWORD PTR [r8+rax*8], xmm4 ;177.5
> movaps xmm4, xmm3 ;177.20
> subpd xmm4, XMMWORD PTR [32+rdx+rax*8] ;177.20
> mulpd xmm4, xmm3 ;177.23
> addpd xmm4, xmm0 ;177.25
> mulpd xmm4, xmm3 ;177.29
> addpd xmm4, xmm1 ;177.32
> mulpd xmm3, xmm4 ;177.36
> addpd xmm3, xmm2 ;177.38
> movaps XMMWORD PTR [32+r8+rax*8], xmm3 ;177.5
> movaps xmm3, XMMWORD PTR [48+rcx+rax*8] ;173.16
> movaps xmm5, xmm3 ;177.20
> subpd xmm5, XMMWORD PTR [48+rdx+rax*8] ;177.20
> mulpd xmm5, xmm3 ;177.23
> addpd xmm5, xmm0 ;177.25
> mulpd xmm5, xmm3 ;177.29
> addpd xmm5, xmm1 ;177.32
> mulpd xmm3, xmm5 ;177.36
> addpd xmm3, xmm2 ;177.38
> movaps XMMWORD PTR [48+r8+rax*8], xmm3 ;177.5
> add rax, 8 ;171.3
> cmp rax, r9 ;171.3
> jb .B1.4 ; Prob 82% ;171.3
>
>

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Knights Landing details (new article)David Kanter2014/01/02 11:58 PM
  eDRAM as cacheiz2014/01/03 03:39 AM
    eDRAM optionsEric Bron2014/01/09 02:45 AM
  Knights Landing details (new article)Emil Briggs2014/01/03 05:06 AM
  Knights Landing details (new article)Michael S2014/01/03 06:05 AM
    PCI-E and QPIDavid Kanter2014/01/03 11:11 AM
  eDRAM still seems too expensive ...Mark Roulo2014/01/03 09:48 AM
    Nevermind ... I see that you addressed this :-)Mark Roulo2014/01/03 09:51 AM
    eDRAM still seems too expensive ...Eric Bron2014/01/03 12:42 PM
  eDRAM or stacked DRAM?Patrick Chase2014/01/03 10:21 AM
    eDRAM or stacked DRAM?Wes Felter2014/01/03 02:00 PM
      eDRAM or stacked DRAM?Patrick Chase2014/01/03 06:26 PM
        eDRAM or stacked DRAM?tarlinian2014/06/23 08:59 PM
          eDRAM or stacked DRAM?Maynard Handley2014/06/24 12:47 AM
            eDRAM or stacked DRAM?Michael S2014/06/24 02:13 AM
            eDRAM or stacked DRAM?David Kanter2014/06/24 11:09 AM
              eDRAM or stacked DRAM?anon2014/06/24 06:50 PM
                eDRAM or stacked DRAM?Eric Bron2014/06/24 09:02 PM
                  eDRAM or stacked DRAM?anon2014/06/24 09:39 PM
                eDRAM or stacked DRAM?Michael S2014/06/25 12:46 AM
              eDRAM or stacked DRAM?Michael S2014/06/25 12:29 AM
          eDRAM or stacked DRAM?Eric Bron2014/06/24 04:37 AM
            eDRAM or stacked DRAM?tarlinian2014/06/24 07:53 AM
              eDRAM or stacked DRAM?Eric Bron2014/06/24 08:09 AM
                eDRAM or stacked DRAM?tarlinian2014/06/24 08:40 AM
                  eDRAM or stacked DRAM?Eric Bron2014/06/24 09:10 AM
                    eDRAM or stacked DRAM?Eric Bron2014/06/24 09:12 AM
          eDRAM or stacked DRAM?Wes Felter2014/06/24 09:09 PM
            eDRAM or stacked DRAM?Michael S2014/06/25 01:02 AM
  Why not tag-inclusive L3?Paul A. Clayton2014/01/03 03:28 PM
    Why not tag-inclusive L3?Eric Bron2014/01/04 02:22 AM
  Knights Landing L/S bandwidthNicolas Capens2014/01/04 04:43 AM
    Knights Landing L/S bandwidthEric Bron2014/01/04 05:20 AM
      Knights Landing L/S bandwidthNicolas Capens2014/01/04 01:55 PM
        Knights Landing L/S bandwidthEric Bron2014/01/04 02:27 PM
          Knights Landing L/S bandwidthhobold2014/01/04 03:23 PM
            Knights Landing L/S bandwidthEric Bron2014/01/04 04:20 PM
              Knights Landing L/S bandwidthMichael S2014/01/05 02:42 AM
                Knights Landing L/S bandwidthEric Bron2014/01/05 02:49 AM
                  Knights Landing L/S bandwidthPatrick Chase2014/01/11 07:13 PM
                    Knights Landing L/S bandwidthNicolas Capens2014/01/13 07:39 PM
                Knights Landing L/S bandwidthNicolas Capens2014/01/05 02:18 PM
                  Knights Landing L/S bandwidthMichael S2014/01/06 03:09 AM
                    Knights Landing L/S bandwidthEric Bron2014/01/06 04:11 AM
                      Knights Landing L/S bandwidthMichael S2014/01/06 04:40 AM
                        Knights Landing L/S bandwidthEric Bron2014/01/06 04:54 AM
                        Knights Landing L/S bandwidthEric Bron2014/01/08 08:00 AM
                    Knights Landing L/S bandwidthNicolas Capens2014/01/07 02:31 PM
                      Knights Landing L/S bandwidthMichael S2014/01/07 03:17 PM
                        Knights Landing L/S bandwidthNicolas Capens2014/01/07 08:55 PM
                          Knights Landing L/S bandwidthMichael S2014/01/08 12:42 AM
                            Knights Landing L/S bandwidthGabriele Svelto2014/01/08 07:30 AM
                              Occam's razorNicolas Capens2014/01/08 01:33 PM
                                Occam's razorGabriele Svelto2014/01/08 01:51 PM
                                  Occam's razorEric Bron2014/01/08 02:28 PM
                                    Occam's razorbakaneko2014/01/09 03:45 AM
                                      Occam's razoranon2014/01/09 04:02 AM
                                        Occam's razorbakaneko2014/01/09 05:24 AM
                                          Occam's razorbakaneko2014/01/09 05:51 AM
                                            Occam's razoranon2014/01/09 06:18 AM
                                          Occam's razoranon2014/01/09 06:16 AM
                                            Occam's razorbakaneko2014/01/09 07:43 AM
                                              Occam's razoranon2014/01/09 08:17 AM
                                                Occam's razorbakaneko2014/01/09 10:12 AM
                                                  Occam's razorEric Bron2014/01/09 10:18 AM
                                                    Occam's razorbakaneko2014/01/09 10:58 AM
                                                  Occam's razoranon2014/01/09 11:35 AM
                                                    Occam's razorbakaneko2014/01/12 09:48 AM
                                                  99.9% not a new extensionNicolas Capens2014/01/10 10:39 AM
                                                    Compiler complexityGabriele Svelto2014/01/11 02:58 AM
                                                      Compiler complexityNicolas Capens2014/01/11 12:20 PM
                                                        Compiler complexityGabriele Svelto2014/01/11 02:17 PM
                                                          Patent pendingNicolas Capens2014/01/14 06:21 PM
                                                    99.9% not a new extensionbakaneko2014/01/12 10:08 AM
                                  L0 data cacheEric Bron2014/01/08 03:52 PM
                                  Occam's razorDavid Kanter2014/01/08 03:53 PM
                                    Occam's razorNicolas Capens2014/01/09 02:07 AM
                                      Occam's razorRicardo B2014/01/09 04:21 AM
                                        Virtually indexed, untaggedNicolas Capens2014/01/10 10:27 AM
                                          Virtually indexed, untaggedGabriele Svelto2014/01/11 03:08 AM
                                            Virtually indexed, untaggedNicolas Capens2014/01/11 08:45 PM
                                              Virtually indexed, untaggedDavid Kanter2014/01/12 01:13 AM
                                                Virtually indexed, untaggedanon2014/01/12 03:02 AM
                                                Virtually indexed, untaggedNicolas Capens2014/01/16 08:55 AM
                                              Virtually indexed, untaggedMichael S2014/01/12 03:09 AM
                                                Virtually indexed, untaggedNicolas Capens2014/01/16 09:47 AM
                                      Occam's razorDavid Kanter2014/01/09 05:42 PM
                                        Occam's razorNicolas Capens2014/01/10 01:22 PM
                                          Occam's razorDavid Kanter2014/01/10 03:06 PM
                                            MEM : ALU ratioNicolas Capens2014/01/10 11:24 PM
                                              MEM : ALU ratioGabriele Svelto2014/01/11 02:47 AM
                                                MEM : ALU ratioEric Bron2014/01/11 03:41 AM
                                                  MEM : ALU ratioEric Bron2014/01/11 04:06 AM
                                                    MEM : ALU ratioDavid Kanter2014/01/11 07:28 PM
                                                      MEM : ALU ratioEric Bron nli2014/01/12 01:54 AM
                                                  MEM : ALU ratioGabriele Svelto2014/01/11 09:15 AM
                                                MEM : ALU ratioNicolas Capens2014/01/14 05:56 PM
                                                  Etiquette in linking to papersPaul A. Clayton2014/01/14 06:44 PM
                                                  MEM : ALU ratioanon2014/01/14 07:32 PM
                                                    L0 power costNicolas Capens2014/01/16 01:05 PM
                                                      L0 power costanon2014/01/16 09:01 PM
                                                        L0 power costNicolas Capens2014/01/18 11:30 PM
                                                          Links revealedPaul A. Clayton2014/01/19 03:47 PM
                                                          L0 power costanon2014/01/20 12:19 AM
                                                            L0 power costNicolas Capens2014/01/20 01:49 PM
                                                              L0 power costanon2014/01/21 12:18 AM
                                                                Q.E.D.Nicolas Capens2014/01/21 07:44 PM
                                                                  Q.E.D.anon2014/01/21 08:24 PM
                                                                    Straw manNicolas Capens2014/01/23 10:56 PM
                                                                      Straw mananon2014/01/25 05:46 AM
                                                                        Still waiting for an explanationNicolas Capens2014/01/25 11:19 PM
                                                                          Still waiting for an explanationExophase2014/01/26 12:13 PM
                                                                            Still waiting for an explanationbakaneko2014/01/26 10:52 PM
                                                                  Q.E.D.Ricardo B2014/01/22 05:58 PM
                                                                    Q.E.D.Michael S2014/01/23 03:59 AM
                                                                      L0 entry countNicolas Capens2014/01/24 12:11 AM
                                                                        L0 entry countEric Bron2014/01/24 01:08 AM
                                                                          L0 entry countMichael S2014/01/24 05:18 AM
                                                                            L0 entry countEric Bron2014/01/24 06:15 AM
                                                                              L0 entry countMichael S2014/01/24 07:10 AM
                                                                                L0 entry countEric Bron2014/01/24 07:20 AM
                                                                          L0 entry countNicolas Capens2014/01/24 01:33 PM
                                                                            L0 entry countEric Bron2014/01/24 02:20 PM
                                                                              L0 entry count and L1 read port orthogonalityNicolas Capens2014/01/26 12:14 AM
                                                                                L0 entry count and L1 read port orthogonalityEric Bron2014/01/26 02:49 AM
                                                                    L0 hit rateNicolas Capens2014/01/23 11:49 PM
                                                                      L0 hit rateRicardo B2014/01/24 05:42 AM
                                                                        L0 hit rateExophase2014/01/24 12:37 PM
                                                                          L0 hit rateEric Bron2014/01/24 01:12 PM
                                                                        L0 vs RF powerNicolas Capens2014/01/24 01:43 PM
                                              MEM : ALU ratioDavid Kanter2014/01/11 12:47 PM
                                                MEM : ALU ratioNicolas Capens2014/01/16 08:23 AM
                                                  MEM : ALU ratioStubabe2014/01/17 11:58 AM
                                                    MEM : ALU ratioStubabe2014/01/17 12:42 PM
                                                      MEM : ALU ratioMichael S2014/01/18 03:57 PM
                                                        MEM : ALU ratiobakaneko2014/01/18 11:47 PM
                                                    MEM : ALU ratioNicolas Capens2014/01/20 02:48 PM
                                                      It's called "tunnel vision" (NT)iz2014/01/20 03:36 PM
                                                      MEM : ALU ratioMichael S2014/01/20 03:37 PM
                                                        MEM : ALU ratioStubabe2014/01/21 03:54 PM
                                                        MEM : ALU ratioNicolas Capens2014/01/21 09:07 PM
                                                          MEM : ALU ratioMichael S2014/01/22 07:17 AM
                                                            MEM : ALU ratioNicolas Capens2014/01/24 02:33 PM
                                                      MEM : ALU ratioStubabe2014/01/21 03:32 PM
                                                        MEM : ALU ratioMichael S2014/01/22 07:56 AM
                                                          MEM : ALU ratioStubabe2014/01/23 08:06 AM
                                                            MEM : ALU ratioEric Bron2014/01/23 08:45 AM
                                                              editEric Bron2014/01/23 08:49 AM
                                                            MEM : ALU ratioMichael S2014/01/23 08:58 AM
                                                              MEM : ALU ratioEric Bron2014/01/23 09:29 AM
                                                                MEM : ALU ratioMichael S2014/01/23 09:33 AM
                                                              MEM : ALU ratioStubabe2014/01/24 03:50 AM
                                                MEM : ALU ratiobakaneko2014/01/23 09:36 AM
                                              MEM : ALU ratioNoSpammer2014/01/11 02:39 PM
                                                L1 vs L0 access costNicolas Capens2014/01/16 02:17 PM
                                                  L1 vs L0 access costNoSpammer2014/01/19 12:48 PM
                                                    L1 vs L0 access costdmcq2014/01/22 04:45 AM
                                                      L1 vs L0 access costGabriele Svelto2014/01/22 06:29 AM
                                                        L1 vs L0 access costdmcq2014/01/22 12:33 PM
                                                          L1 vs L0 access costGabriele Svelto2014/01/22 03:33 PM
                                                            L1 vs L0 access costdmcq2014/01/24 03:19 AM
                                                    L1 vs L0 access costNicolas Capens2014/01/24 01:16 AM
                                      Occam's razorPatrick Chase2014/01/13 10:19 AM
                                  Occam's razorNicolas Capens2014/01/08 11:40 PM
                                    Occam's razorGabriele Svelto2014/01/09 01:41 AM
                                      Occam's razorEric Bron2014/01/09 01:54 AM
                                        Occam's razorGabriele Svelto2014/01/09 05:35 AM
                                          Occam's razorEric Bron2014/01/09 06:14 AM
                                            avoiding redundant loadsEric Bron2014/01/09 06:18 AM
                                            AVX2 versionEric Bron2014/01/09 06:32 AM
                                      Occam's razorAmiba Gelos2014/01/09 02:01 AM
                                        Occam's razorEric Bron2014/01/09 02:06 AM
                                          Occam's razorAmiba Gelos2014/01/09 02:43 AM
                                            Occam's razorEric Bron2014/01/09 03:02 AM
                                        L0 access latencyNicolas Capens2014/01/09 03:27 AM
                                          L0 access latencyAmiba Gelos2014/01/09 04:16 AM
                                            compared to L0$ i would say banking is far more likely (NT)Amiba Gelos2014/01/09 04:20 AM
                                            L0 access latencyNicolas Capens2014/01/10 02:20 PM
                                      Occam's razorNicolas Capens2014/01/09 03:19 AM
                                    Occam's razorNoSpammer2014/01/09 11:55 AM
                                      Occam's razorNicolas Capens2014/01/10 02:40 PM
                                        Occam's razorMichael S2014/01/11 09:21 AM
                                        Occam's razorMichael S2014/01/12 02:21 PM
                                          KNC compiler outputNicolas Capens2014/01/16 05:39 PM
                                            KNC compiler outputMichael S2014/01/18 04:13 PM
                                    L0 cache coherencyDavid Kanter2014/01/11 07:39 PM
                                Occam's razoranon2014/01/09 04:12 AM
                            Knights Landing L/S bandwidthEric Bron2014/01/08 09:46 AM
                              Knights Landing L/S bandwidthMichael S2014/01/08 10:23 AM
                            Knights Landing L/S bandwidthNicolas Capens2014/01/08 01:02 PM
                              Knights Landing L/S bandwidthMichael S2014/01/08 01:29 PM
                                Knights Landing L/S bandwidthEric Bron2014/01/08 01:54 PM
                                  Knights Landing L/S bandwidthMichael S2014/01/08 02:00 PM
                                    Knights Landing L/S bandwidthEric Bron2014/01/08 02:13 PM
                                      Knights Landing L/S bandwidthMichael S2014/01/08 02:28 PM
                                        Knights Landing L/S bandwidthEric Bron2014/01/08 02:32 PM
                                          Knights Landing L/S bandwidthMichael S2014/01/08 02:40 PM
                                            Knights Landing L/S bandwidthEric Bron2014/01/08 02:51 PM
                                              Knights Landing L/S bandwidthMichael S2014/01/09 11:18 AM
                          Knights Landing L/S bandwidthPatrick Chase2014/01/12 09:03 PM
                            Also page/line splits?David Kanter2014/01/12 09:50 PM
                              Also page/line splits?anon2014/01/13 12:44 AM
                                Also page/line splits?none2014/01/13 02:09 AM
                                  Also page/line splits?anon2014/01/13 03:19 AM
                            Knights Landing L/S bandwidthExophase2014/01/12 11:15 PM
                            Knights Landing L/S bandwidthanon2014/01/13 12:41 AM
                              Knights Landing L/S bandwidthPatrick Chase2014/01/13 10:14 AM
                            Aliased writesNicolas Capens2014/01/14 08:46 PM
                      Knights Landing L/S bandwidthRicardo B2014/01/07 03:27 PM
                        Knights Landing L/S bandwidthNicolas Capens2014/01/07 09:28 PM
                          Knights Landing L/S bandwidthRicardo B2014/01/08 01:13 AM
                            Knights Landing L/S bandwidthEric Bron2014/01/08 10:10 AM
                            Knights Landing L/S bandwidthNicolas Capens2014/01/08 02:31 PM
                              Knights Landing L/S bandwidthRicardo B2014/01/08 02:58 PM
                                Knights Landing L/S bandwidthG. Gouvine2014/01/09 08:10 AM
                                  Knights Landing L/S bandwidthRicardo B2014/01/09 10:19 AM
                                    Efficient load queue vs. efficient L0 cacheNicolas Capens2014/01/11 11:28 AM
                                      Efficient load queue vs. efficient L0 cacheG. Gouvine2014/01/13 01:11 AM
                                        Efficient load queue vs. efficient L0 cacheMichael S2014/01/13 02:43 AM
                                Register file read port requirementsNicolas Capens2014/01/10 11:55 PM
                                  Register file read port requirementsRicardo B2014/01/11 04:24 AM
                                    Register file read port requirementsEric Bron2014/01/11 04:32 AM
                                      Register file read port requirementsMichael S2014/01/11 08:57 AM
                                        Register file read port requirementsEric Bron2014/01/11 10:16 AM
                                          Register file read port requirementsMichael S2014/01/11 10:46 AM
                                            Register file read port requirementsEric Bron2014/01/11 11:12 AM
                                              Register file read port requirementsMichael S2014/01/11 11:36 AM
                                                Register file read port requirementsEric Bron2014/01/11 11:51 AM
                                              Register file read port requirementsPatrick Chase2014/01/13 01:27 PM
                                                Register file read port requirementsEric Bron2014/01/13 03:24 PM
                                                  Register file read port requirementsPatrick Chase2014/01/13 05:02 PM
                                                    Register file read port requirementsEric Bron2014/01/14 03:50 AM
                                                      Register file read port requirementsMichael S2014/01/14 10:36 AM
                                                        Register file read port requirementsEric Bron nli2014/01/14 12:04 PM
                                            Register file read port requirementsPatrick Chase2014/01/13 01:17 PM
                                              Register file read port requirementsMichael S2014/01/15 03:27 AM
                                        Register file read port requirementsEric Bron2014/01/11 10:28 AM
                                          Register file read port requirementsMichael S2014/01/11 11:07 AM
                                            Register file read port requirementsPatrick Chase2014/01/13 01:40 PM
                                          Register file read port requirementsPatrick Chase2014/01/13 01:34 PM
                                      Register file read port requirementsRicardo B2014/01/11 11:55 AM
                                        Register file read port requirementsEric Bron2014/01/11 12:17 PM
                                          Register file read port requirementsRicardo B2014/01/11 01:36 PM
                                            Register file read port requirementsEric Bron2014/01/11 01:42 PM
                                              Register file read port requirementsRicardo B2014/01/11 02:20 PM
                                                Register file read port requirementsEric Bron2014/01/11 02:26 PM
                                                  Register file read port requirementsMichael S2014/01/11 03:07 PM
                                                    Register file read port requirementsRicardo B2014/01/11 03:38 PM
                                                      Register file read port requirementsMichael S2014/01/11 03:49 PM
                                                Register file read port requirementsEric Bron2014/01/11 02:39 PM
                                                  Register file read port requirementsEric Bron2014/01/11 02:41 PM
                                                  Register file read port requirementsRicardo B2014/01/11 03:30 PM
                                    Register file read port requirementsNicolas Capens2014/01/11 11:09 AM
              Knights Landing L/S bandwidthanon2014/01/05 05:55 AM
                Knights Landing L/S bandwidthEric Bron2014/01/05 06:30 AM
                  Knights Landing L/S bandwidthanon2014/01/06 12:07 AM
                    Knights Landing L/S bandwidthEric Bron2014/01/06 01:38 AM
                      Knights Landing L/S bandwidthanon2014/01/06 03:01 AM
                        Knights Landing L/S bandwidthEric Bron2014/01/06 03:44 AM
                          Knights Landing L/S bandwidthanon2014/01/06 04:39 AM
                            Knights Landing L/S bandwidthEric Bron2014/01/06 05:00 AM
                              Knights Landing L/S bandwidthanon2014/01/06 05:44 AM
                                Knights Landing L/S bandwidthMichael S2014/01/06 07:54 AM
                                  Knights Landing L/S bandwidthEric Bron2014/01/06 09:11 AM
                                    Knights Landing L/S bandwidthMichael S2014/01/06 09:14 AM
                                      Knights Landing L/S bandwidthEric Bron2014/01/06 10:37 AM
                                        Knights Landing L/S bandwidthRicardo B2014/01/08 05:25 AM
                                          Knights Landing L/S bandwidthEric Bron2014/01/08 07:36 AM
                                            Knights Landing L/S bandwidthEric Bron2014/01/08 07:41 AM
                                            KNC code generator with EVEX back-end?Michael S2014/01/08 08:43 AM
                                              KNC code generator with EVEX back-end?Exophase2014/01/08 09:00 AM
                                                KNC code generator with EVEX back-end?Ricardo B2014/01/08 10:39 AM
                                                  KNC code generator with EVEX back-end?Eric Bron2014/01/08 11:15 AM
                                                    KNC code generator with EVEX back-end?Exophase2014/01/08 12:17 PM
                                                      KNC code generator with EVEX back-end?Ricardo B2014/01/08 01:06 PM
                                                        KNC code generator with EVEX back-end?Exophase2014/01/08 01:24 PM
                                                        KNC code generator with EVEX back-end?Eric Bron2014/01/08 01:38 PM
                                                    KNC code generator with EVEX back-end?Michael S2014/01/08 12:54 PM
                                              KNC code generator with EVEX back-end?Eric Bron2014/01/08 09:25 AM
                                              KNC code generator with EVEX back-end?Eric Bron2014/01/08 09:35 AM
                                                KNC code generator with EVEX back-end?Michael S2014/01/08 10:07 AM
                                                  KNC code generator with EVEX back-end?Eric Bron2014/01/08 10:24 AM
                                                    KNC code generator with EVEX back-end?Michael S2014/01/08 10:43 AM
                                                      KNC code generator with EVEX back-end?Eric Bron2014/01/08 12:23 PM
                                              KNC code generator with EVEX back-end?Eric Bron2014/01/08 09:43 AM
                                          AVX2 code much different than AVX-512Eric Bron2014/01/08 07:52 AM
                                            evil questionhobold2014/01/08 09:22 AM
                                              evil questionEric Bron2014/01/08 09:27 AM
                                                evil questionhobold2014/01/08 01:33 PM
                                                  evil questionMichael S2014/01/08 01:37 PM
                                                    stupid question (was: evil question)hobold2014/01/09 04:41 AM
                                                      stupid question (was: evil question)Eric Bron2014/01/09 04:52 AM
                                                        stupid question (was: evil question)Michael S2014/01/09 07:00 AM
                                                          stupid question (was: evil question)Michael S2014/01/09 07:12 AM
                                                            stupid question (was: evil question)Eric Bron2014/01/09 09:47 AM
                                                              stupid question (was: evil question)Michael S2014/01/09 10:48 AM
                                                                more decisive (hopefully) test caseMichael S2014/01/09 11:01 AM
                                                                  more decisive (hopefully) test caseEric Bron2014/01/09 11:08 AM
                                                                    more decisive (hopefully) test caseMichael S2014/01/09 11:24 AM
                                                                      more decisive (hopefully) test caseEric Bron2014/01/09 11:27 AM
                                                                        more decisive (hopefully) test caseMichael S2014/01/09 11:33 AM
                                                                  AVX2Eric Bron2014/01/09 11:14 AM
                                                                    AVX2Michael S2014/01/09 11:30 AM
                                                                      AVX2Eric Bron2014/01/09 11:40 AM
                                                                  another tryMichael S2014/01/09 02:02 PM
                                                                    another tryEric Bron2014/01/09 02:33 PM
                                                                      another tryMichael S2014/01/09 03:20 PM
                                                                      another try - ignore misformated mess aboveMichael S2014/01/09 03:24 PM
                                                                        another try - ignore misformated mess aboveGabriele Svelto2014/01/10 12:01 AM
                                                                          another try - ignore misformated mess aboveEric Bron2014/01/10 02:05 AM
                                                                            another try - ignore misformated mess aboveMichael S2014/01/11 09:23 AM
                                                                              another try - ignore misformated mess aboveEric Bron2014/01/11 10:08 AM
                                                                                another try - ignore misformated mess aboveMichael S2014/01/11 11:09 AM
                                                                                  another try - ignore misformated mess aboveMichael S2014/01/11 11:12 AM
                                                                                    another try - ignore misformated mess aboveEric Bron2014/01/11 11:24 AM
                                                                                      another try - ignore misformated mess aboveMichael S2014/01/11 12:24 PM
                                                                                        another try - ignore misformated mess aboveEric Bron2014/01/11 01:11 PM
                                                                                          another try - ignore misformated mess aboveMichael S2014/01/11 01:18 PM
                                                                                            another try - ignore misformated mess aboveEric Bron2014/01/11 01:27 PM
                                                                                              another try - ignore misformated mess aboveMichael S2014/01/11 01:29 PM
                                                                                                another try - ignore misformated mess aboveEric Bron2014/01/11 01:46 PM
                                                                                                  another try - ignore misformated mess aboveEric Bron2014/01/11 01:46 PM
                                                                                                  another try - ignore misformated mess aboveMichael S2014/01/11 02:28 PM
                                                                                        another try - ignore misformated mess aboveEric Bron2014/01/11 01:17 PM
                                                                                          another try - ignore misformated mess aboveMichael S2014/01/11 01:24 PM
                                                                    KNC versionMichael S2014/01/11 04:19 PM
                                                                      KNC versionEric Bron nli2014/01/12 01:59 AM
                                                                        KNC versionGabriele Svelto2014/01/12 08:06 AM
                                                  evil questionEric Bron2014/01/08 01:41 PM
              Knights Landing L/S bandwidthPatrick Chase2014/01/05 10:20 PM
                Knights Landing L/S bandwidthEric Bron2014/01/06 01:45 AM
                  Knights Landing L/S bandwidthanon2014/01/06 03:12 AM
                    Knights Landing L/S bandwidthMichael S2014/01/06 03:17 AM
                      Knights Landing L/S bandwidthanon2014/01/06 04:20 AM
          Knights Landing L/S bandwidthNicolas Capens2014/01/04 04:34 PM
            Knights Landing L/S bandwidthEric Bron2014/01/04 04:44 PM
              Knights Landing L/S bandwidthNicolas Capens2014/01/05 11:25 AM
                Knights Landing L/S bandwidthEric Bron2014/01/05 12:50 PM
                  Knights Landing L/S bandwidthNicolas Capens2014/01/05 02:34 PM
                    Might even help with gatherNicolas Capens2014/01/05 02:40 PM
                      What is an L0 cache?David Kanter2014/01/05 09:44 PM
                        What is an L0 cache?anon2014/01/06 04:57 AM
                          What is an L0 cache?Nicolas Capens2014/01/06 11:57 AM
                            What is an L0 cache?anon2014/01/06 01:18 PM
    Knights Landing L/S bandwidthDavid Kanter2014/01/04 09:58 AM
      Knights Landing L/S bandwidthNicolas Capens2014/01/04 03:24 PM
        Knights Landing L/S bandwidthEric Bron2014/01/04 03:46 PM
          Knights Landing L/S bandwidthKonrad Schwarz2014/01/07 11:48 PM
            Knights Landing L/S bandwidthMichael S2014/01/08 01:45 AM
        Knights Landing L/S bandwidthDavid Kanter2014/01/05 12:44 AM
          Knights Landing L/S bandwidthEric Bron2014/01/05 02:55 AM
          Knights Landing L/S bandwidthNicolas Capens2014/01/05 11:18 AM
            Knights Landing L/S bandwidthMaynard Handley2014/01/05 10:33 PM
              Knights Landing L/S bandwidthEric Bron2014/01/06 03:02 AM
                Knights Landing L/S bandwidthMichael S2014/01/06 03:23 AM
                  Knights Landing L/S bandwidthEric Bron2014/01/06 03:35 AM
                    Knights Landing L/S bandwidthMichael S2014/01/06 04:20 AM
                      Knights Landing L/S bandwidthMichael S2014/01/06 04:32 AM
                      Knights Landing L/S bandwidthEric Bron2014/01/06 04:36 AM
                        Knights Landing L/S bandwidthMichael S2014/01/06 05:00 AM
                          Knights Landing L/S bandwidthEric Bron2014/01/06 05:07 AM
                          Knights Landing L/S bandwidthEric Bron2014/01/06 05:14 AM
                            editsEric Bron2014/01/06 05:22 AM
                              optimized versionEric Bron2014/01/06 05:35 AM
                                yet more optimized versionEric Bron2014/01/06 05:42 AM
                                  latest version for todayEric Bron2014/01/06 05:51 AM
                                    Probably just L2 bandwith limitedNicolas Capens2014/01/06 10:48 AM
                                  yet more optimized versionMaynard Handley2014/01/06 05:54 PM
                                optimized versionMaynard Handley2014/01/06 05:52 PM
                                  optimized versionMichael S2014/01/07 09:42 AM
                                    optimized versionNicolas Capens2014/01/07 11:36 AM
                                      optimized versionMichael S2014/01/07 02:41 PM
                                        optimized versionNicolas Capens2014/01/07 09:52 PM
                                          optimized versionMichael S2014/01/08 01:10 AM
                                    optimized versionEric Bron2014/01/07 01:34 PM
                                      optimized versionMichael S2014/01/07 02:18 PM
                                        optimized versionEric Bron2014/01/07 02:30 PM
                                          optimized versionEric Bron2014/01/07 02:33 PM
                                            optimized versionMichael S2014/01/07 02:57 PM
                                    optimized versionMaynard Handley2014/01/07 05:50 PM
                                      optimized versionMichael S2014/01/08 01:39 AM
                Knights Landing L/S bandwidthMaynard Handley2014/01/06 05:47 PM
              Knights Landing L/S bandwidthNicolas Capens2014/01/06 08:18 AM
                Knights Landing L/S bandwidthMaynard Handley2014/01/06 05:56 PM
                  Knights Landing L/S bandwidthNicolas Capens2014/01/07 11:18 AM
        Knights Landing L/S bandwidthNoSpammer2014/01/05 12:15 PM
          Knights Landing L/S bandwidthNicolas Capens2014/01/05 02:06 PM
            Knights Landing L/S bandwidthNoSpammer2014/01/06 03:20 AM
              Knights Landing L/S bandwidthNicolas Capens2014/01/06 10:54 AM
                Knights Landing L/S bandwidthNoSpammer2014/01/06 12:24 PM
                  Knights Landing L/S bandwidthNicolas Capens2014/01/06 08:15 PM
                    Knights Landing L/S bandwidthNoSpammer2014/01/07 02:58 AM
                      Knights Landing L/S bandwidthNicolas Capens2014/01/07 02:18 PM
                        Knights Landing L/S bandwidthNoSpammer2014/01/08 12:38 PM
                          Knights Landing L/S bandwidthNicolas Capens2014/01/08 10:14 PM
  AVX512F questionMichael S2014/01/06 09:18 AM
    AVX512F questionNicolas Capens2014/01/06 11:01 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?