low bar

By: Michael S (already5chosen.delete@this.yahoo.com), April 9, 2013 4:38 pm
Room: Moderated Discussions
Symmetry (someone.delete@this.somewhere.com) on April 9, 2013 10:41 am wrote:
> There's been an interesting paper on some recent improvements being made in the Glasgow
> Haskell Compiler making the rounds on the internet, and I thought some parts might be of
> interest here. I'm also curious if letting the compiler take care of prefetching is something
> that might already exist in, say, Intel's C compiler even if it doesn't in gcc.
>
> http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf


They had chosen rather low bar for comparison. gcc auto-vectorizer is pretty stupid.
It is smart enough to generate SSE2 vector, but here it ends. It does not understand that addpd has best-case latency of 3 clocks, which means that for best performance inner dot loop should have at lest 3 independent accumulators, or for if you want performance compatibility with K10.

Ignoring what they call dribling for the sake of brevity, on single core of my i5-3450 for L1D resident case I easily achieved 11.134 GFLOP/s with SSE2 intrinsics.
For comparison, gcc 4.7.0 vectorizer only did 4.577 GFLOP/s, similarly to their report.
To be compatible with their results I intentionally disabled AVX code generation.

Code:

double dot4_x2(const double* a, const double* b, int dotBuflen)
{
const __m128d* a2 = reinterpret_cast(a);
const __m128d* b2 = reinterpret_cast(b);
__m128d sum0 = _mm_setzero_pd();
__m128d sum1 = _mm_setzero_pd();
__m128d sum2 = _mm_setzero_pd();
__m128d sum3 = _mm_setzero_pd();
int dotBuflen2 = dotBuflen/2;
for (int i = 0; i < dotBuflen2; i += 4)
{
sum0 += a2[i+0] * b2[i+0];
sum1 += a2[i+1] * b2[i+1];
sum2 += a2[i+2] * b2[i+2];
sum3 += a2[i+3] * b2[i+3];
}
__m128d sum01 = sum0 + sum1;
__m128d sum23 = sum2 + sum3;
__m128d sum = sum01 + sum23;
sum = _mm_hadd_pd(sum, sum);
return *(double*)(&sum);
}


Compiler flags: -march=corei7 -Ofast -msse -save-temps

Longer dots appear to be totally limited by cache bandwidth. So some prefetch could help, although not by much.
Luckily, in practice long dot products are rarely unavoidable. Typically, you can split your dataset into tiles and get some level of reuse of L1D data.
< Previous Post in Thread 
TopicPosted ByDate
Haskell Compilation ImprovementSymmetry2013/04/09 10:41 AM
  Haskell Compilation ImprovementEric Bron2013/04/09 11:56 AM
  Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:03 PM
    Haskell Compilation ImprovementEduardoS2013/04/09 12:20 PM
      Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:31 PM
        Haskell Compilation ImprovementEduardoS2013/04/09 12:49 PM
    Haskell Compilation Improvement2013/04/11 01:36 AM
      Haskell Compilation ImprovementEric Bron2013/04/11 03:58 AM
        Haskell Compilation ImprovementBrendan2013/04/11 07:06 AM
          Haskell Compilation ImprovementSymmetry2013/04/11 07:45 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:31 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:57 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:26 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 11:36 AM
                Haskell Compilation ImprovementBrendan2013/04/11 05:00 PM
                  Haskell Compilation ImprovementDavid Kanter2013/04/11 08:50 PM
                    Software prefetching in JVMsGabriele Svelto2013/04/12 03:31 PM
                  Haskell Compilation ImprovementEric Bron2013/04/12 09:12 AM
                    Haskell Compilation ImprovementBrendan2013/04/12 11:40 AM
                      Haskell Compilation ImprovementEric Bron2013/04/12 12:15 PM
                        Haskell Compilation ImprovementBrendan2013/04/12 03:34 PM
                          Haskell Compilation ImprovementEric Bron2013/04/12 10:44 PM
                            Haskell Compilation ImprovementBrendan2013/04/13 02:20 AM
                              Haskell Compilation ImprovementEric Bron2013/04/13 02:32 AM
                                Haskell Compilation ImprovementBrendan2013/04/13 10:18 AM
                                  Haskell Compilation ImprovementEric Bron2013/04/14 01:04 AM
                          Haskell Compilation ImprovementEric Bron2013/04/15 08:34 AM
                            Haskell Compilation ImprovementBrendan2013/04/16 03:26 PM
                              Prefetch compilation testsEric Bron2013/04/21 12:52 AM
        Haskell Compilation Improvementanon2013/04/11 07:14 AM
          Haskell Compilation ImprovementMichael S2013/04/11 07:27 AM
            Haskell Compilation Improvementanon2013/04/11 08:25 AM
              Haskell Compilation ImprovementMichael S2013/04/11 08:37 AM
                Haskell Compilation Improvementbakaneko2013/04/11 09:39 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 10:08 AM
                    Haskell Compilation Improvementbakaneko2013/04/11 10:36 AM
                    Haskell Compilation Improvementanon2013/04/11 10:54 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:10 AM
                        Haskell Compilation Improvementanon2013/04/11 11:18 AM
                          Haskell Compilation ImprovementEric Bron2013/04/11 11:27 AM
                            Haskell Compilation Improvementanon2013/04/11 12:02 PM
                              Haskell Compilation ImprovementEric Bron2013/04/11 12:09 PM
                                Haskell Compilation ImprovementEric Bron2013/04/11 12:12 PM
                                Haskell Compilation Improvementanon2013/04/11 12:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/11 12:30 PM
                                    Haskell Compilation Improvementanon2013/04/11 11:30 PM
                                      Haskell Compilation ImprovementEric Bron2013/04/12 09:25 AM
                                        Haskell Compilation Improvementanon2013/04/12 07:12 PM
                                          Haskell Compilation ImprovementEric Bron2013/04/12 10:51 PM
                                  Prefetch *hints*Konrad Schwarz2013/04/12 08:24 AM
                        Haskell Compilation ImprovementLinus Torvalds2013/04/11 12:56 PM
                          Inherent advantage of software prefetchJouni Osmala2013/04/11 09:41 PM
                            Inherent advantage of software prefetchSeni2013/04/13 03:40 AM
                            Another example: software scatter gather (NT)Megol2013/04/14 02:39 AM
                          Haskell Compilation ImprovementMaynard Handley2013/12/31 05:26 PM
                            Haskell Compilation ImprovementTREZA2013/12/31 05:44 PM
                              Haskell Compilation ImprovementMaynard Handley2013/12/31 07:49 PM
                                Haskell Compilation Improvementanon2013/12/31 10:39 PM
                                  Haskell Compilation ImprovementMaynard Handley2014/01/01 02:04 AM
                                  Haskell Compilation Improvementbakaneko2014/01/01 05:31 AM
                                Haskell Compilation ImprovementGabriele Svelto2014/01/02 07:57 AM
                                  Haskell Compilation ImprovementMichael S2014/01/02 08:37 AM
                                    Haskell Compilation ImprovementGabriele Svelto2014/01/02 10:09 AM
                                    Haskell Compilation ImprovementTREZA2014/01/02 12:43 PM
                            Haskell Compilation ImprovementMaynard Handley2013/12/31 06:07 PM
                            Future core architectures. (Was Haskell Compilation Improvement)Maynard Handley2014/01/03 12:06 AM
                              Speculative multi-threadingDavid Kanter2014/01/03 02:12 AM
                                Speculative multi-threadingMaynard Handley2014/01/03 05:01 AM
                              Future core architectures. (Was Haskell Compilation Improvement)Seni2014/01/03 01:09 PM
                              Future core architectures. (Was Haskell Compilation Improvement)Linus Torvalds2014/01/03 01:27 PM
                            Haskell Compilation ImprovementKonrad Schwarz2014/01/04 09:38 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:23 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:50 AM
            Haskell Compilation ImprovementEugene Nalimov2013/04/11 09:20 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:28 AM
                Haskell Compilation ImprovementEduardoS2013/04/11 07:30 PM
            Haskell Compilation Improvementanon2013/04/11 10:19 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 10:30 AM
                Haskell Compilation Improvementanon2013/04/11 10:50 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 11:03 AM
                    Haskell Compilation Improvementanon2013/04/11 11:16 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:24 AM
                        Haskell Compilation Improvementanon2013/04/11 12:09 PM
                          Haskell Compilation ImprovementEric Bron2013/04/11 12:43 PM
                            Haskell Compilation Improvementanon2013/04/11 11:27 PM
                              Haskell Compilation ImprovementEric Bron2013/04/12 12:15 AM
                                Haskell Compilation Improvementanon2013/04/12 07:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/12 11:01 PM
                      Haskell Compilation ImprovementLinus Torvalds2013/04/11 01:05 PM
                        Haskell Compilation Improvementanon2013/04/11 10:42 PM
                        Haskell Compilation ImprovementRobert David Graham2013/04/12 02:12 PM
        Software prefetch architecturePaul A. Clayton2013/04/11 08:54 AM
          Software prefetch architectureEric Bron2013/04/11 09:06 AM
            Software prefetch architectureMegol2013/04/15 11:03 AM
              Software prefetch architectureEric Bron2013/04/15 11:30 AM
  low barMichael S2013/04/09 04:38 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?