"manual memcpy" and modern compilers

By: Travis (travis.downs.delete@this.gmail.com), June 2, 2017 2:05 pm
Room: Moderated Discussions
octoploid (octoploid.delete@this.yandex.com) on June 2, 2017 1:26 am wrote:

> On Ryzen the result is much worse.

The absolute numbers are being thrown off by frequency scaling. In particular, note the first line:
Median CPU speed: 2.194 GHz


If I had to guess, I'd say your true effective (i.e., including turbo) CPU frequency is 2.194 / 0.56 = ~3.92 GHz. Is that about right for your chip?

What happens is that when the benchmark starts up it runs a calibration loop of dependent add instructions (something like a bogomips type thing) to determine the frequency, but the CPU hasn't ramped up to full speed yet so you get a half-way measurement, but by the time the benchmarks are running you've ramped up to full turbo and that's why you see all those 0.56 clock measurements (they should really be 1.00).

On my system I have turned of scaling so the numbers are better, but I need to add that to the readme and implement better detection/warning when we determine that scaling is on.

I'm writing a wrapper script for now that turns off turbo and changes the performance governor to avoid scaling, and I'll check it in shortly.

That aside, the relative results should still be totally valid (just mentally multiply the cycles by 1/0.56).

What I see for Ryzen is:

16, 32 or 64-bit stores that cross a 16-byte boundary take 5 cycles (!), except for one odd case: a 64-bit store that even straddles a 16-byte boundary (e.g., 4 bytes on each side) take only 2 cycles. All other stores take 1 cycle. There is no additional penalty for crossing a 64B boundary.

For 128-bit AVX stores, there is a penalty for all misaligned stores: usually 5 cycles, but 2 cycles for stores that are 4-byte aligned.

The 256-bit AVX stores, the behavior is similar to the 128-bit stores, with most timings doubled: the aligned case (anything on a 16B boundary) now takes 2 cycles (as expected since there is only a 128-bit data path), and most misaligned cases take 7 cycles, while things on a 4-byte boundary take 4 cycles.

So it looks like the "cache access size" that Linus talks about is 16B for stores on Ryzen, and the penalties for crossing it is fairly large: generally 5 cycles not the 2 cycles you'd expect if the hardware was otherwise fully optimized for it. Some special cases related to 4-byte alignment do get a 2 cycle penalty so these are cases where the hardware can evidently avoid the general byte-wise shifting/combination of the two halves and use a more directly method that works on 4-byte granules.

The bad part for AMD, I think, is the generally terrible performance of misaligned vector stores. 5 or 7 cycles is a huge penalty, and since Intel has been providing progressively more awesome unaligned performance over time, a lot of the conventional wisdom about alignment has been gradually replaced with "just do it unaligned", including for all the memcpy type-stuff we have been talking about (e.g., gcc uses unaligned AVX copy as their default). Well that stuff is going to suck on AMD.

We've seen a bit of speculation about memory performance holding Ryzen back, but also plenty of benchmarks which show that e.g., memory bandwidth is fine and latency is reasonable. Maybe alignment is the hidden variable here: those microbenchmarks are almost invariably aligned, but much real-world code isn't, so maybe it's hurting them.

The situation for loads is much better: everything from 32 to 128 bits works at 2 loads per cycle, except loads that cross a 32B boundary. That's the same behavior as Skylake except that Skylake has a 64B boundary so the bad cases are half as many.

256-bit loads are basically like two 128-bit loads. They issue at 1 per cycle (the theoretical max due to 128-bit data path) only if aligned to a 16B boundary, since otherwise at least 1 of the two halves will cross a 32B boundary. At all other alignments they take 1.5 cycles. So using 256-bit loads on Ryzen basically ties using 128-byte loads: the timings are pretty much the same as if you had split each 256-bit loads into two 128-byte loads (and that's probably exactly what happens internally).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is K12 still alive?Heikki Kultala2017/05/11 10:34 PM
  It never made senseSomeone2017/05/12 12:58 AM
    It never made sensejuanrga2017/05/12 05:02 AM
      It never made senseMichael S2017/05/12 05:47 AM
      It never made senseanon.12017/05/12 08:19 AM
        It never made sensewumpus2017/05/12 04:57 PM
          It never made senseanon.12017/05/12 06:37 PM
            It never made sensewumpus2017/05/13 07:52 AM
              It never made senseanon.12017/05/13 06:29 PM
                It never made senseDavid Kanter2017/05/14 12:41 AM
                  It never made sensejuanrga2017/05/14 05:23 AM
                    It never made sensebakaneko2017/05/14 05:56 AM
                  It never made senseanon.12017/05/14 08:36 AM
                Hierofalcon ?Michael S2017/05/14 01:15 AM
                  Hierofalcon ?anyone2017/05/15 10:05 AM
        It never made sensejuanrga2017/05/12 07:11 PM
          It never made senseanon.12017/05/13 06:59 AM
            It never made sensejuanrga2017/05/14 04:35 AM
              It never made senseanon.12017/05/14 09:26 AM
                It never made sensejuanrga2017/05/14 04:47 PM
                  It never made senseanon.12017/05/14 05:49 PM
                    It never made sensejuanrga2017/05/17 05:10 AM
                      It never made senseanon.12017/05/18 09:11 AM
                        It never made sensejuanrga2017/05/20 03:10 AM
                          It never made senseanon.12017/05/20 09:40 AM
                            It never made senseBrett2017/05/20 11:08 AM
                              It never made sensewumpus2017/05/20 12:27 PM
                                It never made senseMichael S2017/05/20 01:49 PM
                            It never made senseanon.12017/05/20 04:19 PM
                              It never made senseBrett2017/05/20 05:44 PM
                                It never made senseanon.12017/05/20 06:22 PM
                                  It never made senseBrett2017/05/20 07:08 PM
                                    It never made senseanon.12017/05/20 07:35 PM
                                    It never made senseJouni Osmala2017/05/21 08:45 AM
                                      It never made senseBrett2017/05/21 12:28 PM
                                        It never made senseJouni Osmala2017/05/22 01:07 AM
                                          It never made senseMichael S2017/05/22 01:27 AM
                                      It never made senseMaynard Handley2017/05/21 08:09 PM
                                        It never made senseAndreas2017/05/23 05:03 AM
                                          It never made senseMaynard Handley2017/05/23 09:37 AM
                                            It never made senseAndreas2017/05/24 05:11 AM
                              It never made sensedmcq2017/05/20 05:45 PM
                                It never made senseanon.12017/05/20 06:24 PM
                                  It never made senseanon.12017/05/20 07:43 PM
                                    It never made sensedmcq2017/05/21 11:34 AM
                                    It never made senseblue2017/05/21 01:29 PM
                                      It never made senseblue2017/05/21 01:30 PM
                                  It never made senseMaynard Handley2017/05/21 08:12 PM
                                  To all! Snip your citations. It's annoying as hell asit is!!! (NT)gallier22017/05/22 12:48 AM
                              Bogus ICC comparisonWilco2017/05/21 04:06 AM
                                Bogus ICC comparisonanon.12017/05/21 08:09 AM
                                  Bogus ICC comparisonMichael S2017/05/21 09:11 AM
                                  Bogus ICC comparisonDavid Kanter2017/05/21 12:42 PM
                                    Bogus ICC comparisonAnne O'Nonymous2017/05/22 04:14 AM
                                      Bogus ICC comparisonslacker2017/05/22 05:21 AM
                                        Bogus ICC comparisonAnne O'Nymous2017/05/23 11:26 AM
                                    Bogus ICC comparisondmcq2017/05/22 05:55 AM
                                      Bogus ICC comparisonanon.12017/05/22 11:59 AM
                                        Bogus ICC comparisonWilco2017/05/22 01:15 PM
                                    Bogus ICC comparisonanon.12017/05/22 11:44 AM
                                      Bogus ICC comparisonWilco2017/05/22 12:55 PM
                                Just look at the 403.gcc resultsDoug S2017/05/21 12:24 PM
                                  Just look at the 403.gcc resultsMaynard Handley2017/05/21 08:17 PM
                                    Just look at the 403.gcc resultsDoug S2017/05/21 10:14 PM
                                      Just look at the 403.gcc resultsdmcq2017/05/22 06:08 AM
                            It never made sensejuanrga2017/05/21 05:46 AM
                              It never made senseanon.12017/05/21 07:57 AM
                                It never made senseanon.12017/05/21 08:32 AM
                              It never made senseAnne O'Nonymous2017/05/22 04:11 AM
                required PRF sizeHeikki Kultala2017/05/14 08:59 PM
                  required PRF sizeWilco2017/05/15 02:18 AM
                    required PRF sizeMichael S2017/05/15 03:05 AM
                      required PRF sizeanon.12017/05/15 06:57 AM
                        required PRF sizeWilco2017/05/15 02:46 PM
                          required PRF sizeanon.12017/05/15 06:30 PM
                            required PRF sizeWilco2017/05/16 03:50 AM
                              required PRF sizeMichael S2017/05/16 04:23 AM
                              required PRF sizeanon.12017/05/16 06:57 AM
                                required PRF sizeRicardo B2017/05/16 09:10 AM
                                  required PRF sizeanon.12017/05/16 11:56 AM
                                    Thanks! (NT)Ricardo B2017/05/16 03:51 PM
                                    required PRF sizeJouni Osmala2017/05/16 10:03 PM
                                      required PRF sizeanon.12017/05/17 12:04 AM
                                  required PRF sizeMaynard Handley2017/05/16 04:56 PM
                              required PRF sizeanon.12017/05/16 08:21 AM
                    required PRF sizeLinus B Torvalds2017/05/15 10:11 AM
                      required PRF sizeMichael S2017/05/15 11:20 AM
                        required PRF sizeLinus B Torvalds2017/05/15 03:49 PM
                          required PRF sizeJouni Osmala2017/05/17 06:04 AM
                      Load-op usageWilco2017/05/15 04:29 PM
                        Load-op usageanon52017/05/15 06:05 PM
                          Load-op usageWilco2017/05/16 05:15 PM
                            Load-op usageMichael S2017/05/17 01:00 AM
                              Load-op usageWilco2017/05/17 03:02 AM
                                could it be C vs C++? (NT)Michael S2017/05/17 03:46 AM
                                Load-op usageGabriele Svelto2017/05/17 05:27 AM
                                  Load-op usageGian-Carlo Pascutto2017/05/17 08:53 AM
                                    Use perf top?Travis2017/05/17 01:21 PM
                                      Use perf top?Wilco2017/05/17 04:23 PM
                                        Use perf top?Travis2017/05/17 06:12 PM
                                          Use perf top?Seni2017/05/17 09:13 PM
                                            Use perf top?Wilco2017/05/18 03:37 AM
                                              Compiled on Skylake? (NT)Michael S2017/05/18 04:16 AM
                                              Use perf top?Gabriele Svelto2017/05/18 05:19 AM
                                                Use perf top?octoploid2017/05/18 05:48 AM
                                                  Use perf top?Gabriele Svelto2017/05/18 09:33 AM
                                                    Use perf top?octoploid2017/05/18 10:51 AM
                                                      Use perf top?Gabriele Svelto2017/05/18 01:12 PM
                                                        Use perf top?octoploid2017/05/18 01:29 PM
                                                          Use perf top?Gian-Carlo Pascutto2017/05/22 08:21 AM
                                                            Use perf top?octoploid2017/05/22 09:01 AM
                                                              Use perf top?Gian-Carlo Pascutto2017/05/22 10:21 AM
                                                                Use perf top?octoploid2017/05/22 10:34 AM
                                                                  Use perf top?Gian-Carlo Pascutto2017/05/22 10:53 AM
                                                                    Use perf top?octoploid2017/05/23 03:54 AM
                                                                      Use perf top?rwessel2017/05/23 08:58 AM
                                                                        Use perf top?octoploid2017/05/23 09:09 AM
                                                                          Use perf top?Megol2017/05/24 05:04 AM
                                                                            Use perf top?octoploid2017/05/24 05:24 AM
                                                                              Use perf top?Gian-Carlo Pascutto2017/05/24 06:53 AM
                                                                                Use perf top?octoploid2017/05/24 07:01 AM
                                                                              Use perf top?Megol2017/05/25 01:24 PM
                                          Use perf top?Wilco2017/05/18 03:20 AM
                                            Use perf top?Travis2017/05/18 02:24 PM
                                              Use perf top?Wilco2017/05/18 04:50 PM
                                                Use perf top?Travis2017/05/18 07:34 PM
                            Load-op usageMichael S2017/05/17 01:21 AM
                              Load-op usageWilco2017/05/17 03:20 AM
                                Load-op usageLinus B Torvalds2017/05/17 09:29 AM
                                  Load-op usageLinus B Torvalds2017/05/17 02:45 PM
                        Load-op usageanon.12017/05/15 06:36 PM
                          Load-op usageMichael S2017/05/16 01:27 AM
                            Load-op usageanon.12017/05/16 07:52 AM
                              Load-op usageanon.12017/05/16 07:58 AM
                              Load-op usageMichael S2017/05/17 12:52 AM
                                Load-op usageanon.12017/05/17 07:03 AM
                                  Load-op usageMichael S2017/05/17 07:24 AM
                                    Load-op usageanon.12017/05/17 11:53 PM
                                      Load-op usageMichael S2017/05/18 12:48 AM
                        Load-op usageLinus B Torvalds2017/05/16 09:01 AM
                          Load-op usageLinus B Torvalds2017/05/16 09:17 AM
                          Load-op usage_Arthur2017/05/17 05:11 PM
                            Load-op usageMichael S2017/05/18 02:50 AM
                            Load-op usageLinus B Torvalds2017/05/18 10:03 AM
                              Load-op usageoctoploid2017/05/18 11:45 AM
                                Load-op usageLinus B Torvalds2017/05/18 12:28 PM
                  required PRF sizeanon.12017/05/15 07:44 AM
                    required PRF sizeslacker2017/05/15 05:20 PM
                      required PRF sizeanon.12017/05/15 07:48 PM
                        required PRF sizeslacker2017/05/15 09:52 PM
                          Fixed linkslacker2017/05/15 09:54 PM
                          required PRF sizeanon.12017/05/16 07:56 AM
          It never made senseanon.12017/05/13 08:03 AM
            It never made senseanon.12017/05/13 08:31 AM
              It never made sensenobody in particular2017/05/13 09:02 AM
              It never made senseGabriele Svelto2017/05/13 09:05 AM
                It never made senseanon.12017/05/13 11:07 AM
                It never made senseAaron Spink2017/05/13 05:18 PM
              It never made senseDavid Hess2017/05/13 07:28 PM
                It never made senseBrett2017/05/13 10:25 PM
                It never made senseanon.12017/05/13 11:44 PM
                  It never made senseNiels Jørgen Kruse2017/05/14 02:37 AM
                    It never made senseanon.12017/05/14 09:45 AM
                      It never made senseNiels Jørgen Kruse2017/05/14 01:06 PM
                    It never made senseMaynard Handley2017/05/16 04:46 AM
                      It never made senseNiels Jørgen Kruse2017/05/16 10:24 PM
                  It never made sensejuanrga2017/05/14 05:02 AM
                    It never made sensenobody in particular2017/05/14 05:31 AM
                      It never made sensejuanrga2017/05/14 02:36 PM
                        It never made sensenobody in particular2017/05/14 03:50 PM
                          It never made sensejuanrga2017/05/14 05:36 PM
                            You're discussing two dead-in-the-water architecturesdefault2017/05/15 02:52 PM
                              You're discussing two dead-in-the-water architecturesblue2017/05/15 07:14 PM
                              You're discussing two dead-in-the-water architecturesjuanrga2017/05/17 04:52 AM
                    It never made senseanon.12017/05/14 08:27 AM
                      It never made senseMichael S2017/05/14 08:54 AM
                        It never made senseanon.12017/05/14 09:40 AM
                      It never made sensejuanrga2017/05/14 03:09 PM
                        It never made sensenobody in particular2017/05/14 03:51 PM
                        It never made senseMichael S2017/05/14 03:56 PM
                        It never made senseanon.12017/05/14 05:54 PM
                  It never made senseDavid Hess2017/05/14 11:02 AM
                    It never made senseBrett2017/05/14 01:24 PM
                      It never made senseMichael S2017/05/15 04:55 AM
                        It never made senseAnon2017/05/15 04:14 PM
                          It never made senseMichael S2017/05/16 02:21 AM
                            It never made sensehobel2017/05/16 08:42 AM
                      It never made senseDavid Hess2017/05/15 06:33 AM
                    It never made sensewumpus2017/05/14 03:08 PM
                      It never made senseDavid Hess2017/05/15 06:23 AM
            It never made sensejuanrga2017/05/14 04:49 AM
              It never made senseAaron Spink2017/05/14 04:58 AM
    It never made senseHeikki Kultala2017/05/12 11:47 AM
      It never made senseAaron Spink2017/05/13 05:20 PM
    It never made senseWes Felter2017/05/12 01:18 PM
      It never made senseanon.12017/05/12 06:32 PM
  Is K12 still alive?juanrga2017/05/12 04:49 AM
    Is K12 still alive?Heikki Kultala2017/05/12 11:31 AM
      Is K12 still alive?who me?2017/05/17 07:39 PM
        Is K12 still alive?juanrga2017/05/18 02:44 AM
        Is K12 still alive?dmcq2017/05/22 06:19 AM
          Is K12 still alive?Foo_2017/05/22 07:56 AM
            Is K12 still alive?David Kanter2017/05/22 02:42 PM
              Is K12 still alive?Linus B Torvalds2017/05/22 07:45 PM
                Is K12 still alive?Michael_S2017/05/22 11:34 PM
                Is K12 still alive?David Kanter2017/05/23 09:17 AM
                  Is K12 still alive?Linus B Torvalds2017/05/23 10:29 AM
                    Is K12 still alive?octoploid2017/05/23 11:25 AM
                      slow AVX-512 memcpy/memsetEric Bron2017/05/23 12:48 PM
                        slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 01:51 PM
                          slow AVX-512 memcpy/memsetEric Bron2017/05/23 02:05 PM
                            slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 02:43 PM
                              slow AVX-512 memcpy/memsetEric Bron2017/05/23 02:59 PM
                                KNL code generator vs 2014Michael S2017/05/24 12:57 AM
                                  KNL code generator vs 2014Eric Bron2017/05/24 04:21 AM
                                  KNL code generator vs 2014anon.5122017/05/24 04:03 PM
                                    KNL code generator vs 2014Michael S2017/05/25 08:32 AM
                                  food for thoughtEric Bron2017/05/24 04:57 PM
                                    icc 17 on godbolt disagreeMichael S2017/05/25 01:45 AM
                                      Sorry, I posted SKX code twiceMichael S2017/05/25 01:48 AM
                                         stall 2 - are KNL VPUs really OoO?Michael S2017/05/25 02:27 AM
                                      which version of icc 17 ? (NT)Eric Bron2017/05/25 03:50 AM
                                        17.0.0Michael S2017/05/25 03:52 AM
                                          17.0.0Eric Bron2017/05/25 04:13 AM
                                          17.0.0Eric Bron2017/05/25 04:24 AM
                                            17.0.0Michael S2017/05/25 05:29 AM
                                              17.0.0Eric Bron2017/05/25 05:43 AM
                                                17.0.0Michael S2017/05/25 08:40 AM
                                                  strange 256-bit code with icc v7.0.4Eric Bron2017/05/25 10:51 AM
                                              17.0.0Eric Bron2017/05/25 05:54 AM
                                          fixed exampleEric Bron2017/05/25 04:57 AM
                              slow AVX-512 memcpy/memsetTravis2017/05/23 03:57 PM
                                correction: has NOT been the caseTravis2017/05/23 03:58 PM
                              slow AVX-512 memcpy/memsetanon2017/05/24 06:00 AM
                                slow AVX-512 memcpy/memsetTravis2017/05/24 02:27 PM
                                  slow AVX-512 memcpy/memsetanon2017/05/25 02:16 AM
                                    slow AVX-512 memcpy/memsetTravis2017/05/25 05:02 PM
                            slow AVX-512 memcpy/memsetGabriele Svelto2017/05/24 05:12 AM
                          slow AVX-512 memcpy/memsetDoug S2017/05/23 02:35 PM
                            slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 03:07 PM
                              Dedicated mem* instructionsDoug S2017/05/23 11:17 PM
                                Dedicated mem* instructionsLinus Torvalds2017/05/24 01:21 AM
                                  Dedicated mem* instructionsLinus Torvalds2017/05/24 08:16 AM
                                    Dedicated mem* instructionsanon2017/05/24 09:52 AM
                                      Dedicated mem* instructionsLinus Torvalds2017/05/24 11:31 AM
                                        Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? (NT)TEMLIB2017/05/24 12:52 PM
                                          asynchronous co-processors are evil (NT)Michael S2017/05/24 12:57 PM
                                          Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?David Hess2017/05/24 03:52 PM
                                          Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?Travis2017/05/24 03:55 PM
                                            Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?TEMLIB2017/05/24 04:29 PM
                                        Dedicated mem* instructionsanon2017/05/24 08:39 PM
                                        AVX-512 and XOPYuhong Bao2017/05/24 11:19 PM
                                          128-bit vs 256-bit vectors in cryptoYuhong Bao2017/05/31 11:37 AM
                                    Dedicated mem* instructionsDoug S2017/05/24 12:37 PM
                                      Dedicated mem* instructionsMichael S2017/05/24 12:55 PM
                                        Dedicated mem* instructionsDoug S2017/05/24 02:35 PM
                                          Dedicated mem* instructionsLinus Torvalds2017/05/24 03:41 PM
                                            Dedicated mem* instructionsTravis2017/05/24 04:20 PM
                                              Dedicated mem* instructionsLinus Torvalds2017/05/25 10:54 AM
                                  Dedicated mem* instructionsGabriele Svelto2017/05/25 04:05 PM
                                Immediate lengths for mem* instructionsPaul A. Clayton2017/05/26 04:55 AM
                              slow AVX-512 memcpy/memsetTravis2017/05/24 03:41 PM
                                ucode branch predictionDavid Kanter2017/05/24 05:45 PM
                          Then why use even AVX2 for memcpy?Mark Roulo2017/05/23 04:30 PM
                            Then why use even AVX2 for memcpy?Linus B Torvalds2017/05/23 10:08 PM
                              Danke (NT).Mark Roulo2017/05/24 11:52 AM
                            It's all about the length of the memcpy.Heikki Kultala2017/05/23 10:18 PM
                              It's all about the length of the memcpy.Heikki Kultala2017/05/23 10:26 PM
                              It's all about the length of the memcpy.Yoav2017/05/24 01:08 AM
                              It's all about the length of the memcpy.Michael S2017/05/24 01:37 AM
                              It's all about the length of the memcpy.Megol2017/05/24 03:39 AM
                              It's all about the length of the memcpy.Gabriele Svelto2017/05/24 05:17 AM
                                It's all about the length of the memcpy.Travis2017/05/24 02:46 PM
                                  It's all about the length of the memcpy.Gabriele Svelto2017/05/25 04:24 AM
                                    It's all about the length of the memcpy.octoploid2017/05/25 04:45 AM
                                      Forgot , but you get the idea (NT)octoploid2017/05/25 05:12 AM
                                        Forgot to add a pre tag but you get the idea (NT)octoploid2017/05/25 05:14 AM
                                      It's all about the length of the memcpy.Gabriele Svelto2017/05/25 03:37 PM
                                        It's all about the length of the memcpy.Wilco2017/05/25 03:48 PM
                                          It's all about the length of the memcpy.Gabriele Svelto2017/05/25 04:07 PM
                                            It's all about the length of the memcpy.Wilco2017/05/26 02:47 AM
                                              "manual memcpy" and modern compilersHeikki Kultala2017/05/27 11:27 PM
                                                "manual memcpy" and modern compilersLinus Torvalds2017/05/29 08:30 PM
                                                  "manual memcpy" and modern compilersTravis2017/05/29 09:32 PM
                                                    "manual memcpy" and modern compilersLinus Torvalds2017/05/30 10:54 AM
                                                      "manual memcpy" and modern compilersJason Creighton2017/05/30 12:33 PM
                                                        "manual memcpy" and modern compilersWilco2017/05/30 08:29 PM
                                                      "manual memcpy" and modern compilersTravis2017/05/30 08:23 PM
                                                        "manual memcpy" and modern compilersWilco2017/05/30 08:34 PM
                                                          "manual memcpy" and modern compilersoctoploid2017/05/30 09:46 PM
                                                            "manual memcpy" and modern compilersWilco2017/05/31 02:28 AM
                                                              "manual memcpy" and modern compilersoctoploid2017/05/31 03:14 AM
                                                                "manual memcpy" and modern compilersWilco2017/05/31 02:42 PM
                                                                "manual memcpy" and modern compilersTravis2017/05/31 06:40 PM
                                                                  "manual memcpy" and modern compilersJouni Osmala2017/05/31 11:42 PM
                                                                    "manual memcpy" and modern compilersLinus Torvalds2017/06/01 10:39 AM
                                                                      "manual memcpy" and modern compilersTravis2017/06/01 04:30 PM
                                                                        "manual memcpy" and modern compilersoctoploid2017/06/02 01:26 AM
                                                                          "manual memcpy" and modern compilersoctoploid2017/06/02 01:27 AM
                                                                            "manual memcpy" and modern compilersTravis2017/06/02 12:18 PM
                                                                              "manual memcpy" and modern compilersTravis2017/06/02 12:40 PM
                                                                          "manual memcpy" and modern compilersoctoploid2017/06/02 03:29 AM
                                                                            "manual memcpy" and modern compilersGiGNiC2017/06/02 05:23 AM
                                                                            "manual memcpy" and modern compilersTravis2017/06/02 07:56 PM
                                                                          "manual memcpy" and modern compilersTravis2017/06/02 02:05 PM
                                                                            "manual memcpy" and modern compilersLinus Torvalds2017/06/02 03:48 PM
                                                                              "manual memcpy" and modern compilersTravis2017/06/02 04:50 PM
                                                                                "manual memcpy" and modern compilersgiovanni deretta2017/06/03 01:43 PM
                                                                                  "manual memcpy" and modern compilersDavid Kanter2017/06/04 10:04 AM
                                                                                  "manual memcpy" and modern compilersTravis2017/06/04 01:53 PM
                                                                                    "manual memcpy" and modern compilersDavid Kanter2017/06/04 09:03 PM
                                                                                      memory renamingTravis2017/06/06 11:52 AM
                                                                                        memory renaminganon.12017/06/07 08:06 PM
                                                                                          memory renaminganon.12017/06/07 08:54 PM
                                                                          "manual memcpy" and modern compilersTravis2017/06/02 08:21 PM
                                                                            "manual memcpy" and modern compilersoctoploid2017/06/02 09:31 PM
                                                                              "manual memcpy" and modern compilersoctoploid2017/06/03 02:19 AM
                                                                                "manual memcpy" and modern compilersTravis2017/06/03 11:38 AM
                                                                                  "manual memcpy" and modern compilersLinus Torvalds2017/06/04 10:57 AM
                                                                                    "manual memcpy" and modern compilersTravis2017/06/04 02:11 PM
                                                                                      "manual memcpy" and modern compilersMichael S2017/06/05 04:47 AM
                                                                        "manual memcpy" and modern compilersLinus Torvalds2017/06/02 09:21 AM
                                                                      "manual memcpy" and modern compilersYuhong Bao2017/06/02 06:02 PM
                                                                        "manual memcpy" and modern compilersLinus Torvalds2017/06/02 10:27 PM
                                                                          "manual memcpy" and modern compilersYuhong Bao2017/06/03 10:26 PM
                                                                            "manual memcpy" and modern compilersLinus Torvalds2017/06/04 11:12 AM
                                                                              "manual memcpy" and modern compilersgiovanni deretta2017/06/05 01:22 AM
                                                                                "manual memcpy" and modern compilersLinus Torvalds2017/06/05 09:49 AM
                                                          "manual memcpy" and modern compilersBrett2017/05/30 10:07 PM
                                                            "manual memcpy" and modern compilersWilco2017/05/31 02:37 AM
                                                              "manual memcpy" and modern compilersBrett2017/05/31 10:28 PM
                                                          "manual memcpy" and modern compilersTravis2017/05/31 06:29 PM
                                                      "manual memcpy" and modern compilersTravis2017/05/31 06:30 PM
                                                        "manual memcpy" and modern compilersWilco2017/06/01 02:06 AM
                                                          "manual memcpy" and modern compilersTravis2017/06/01 12:32 PM
                                                            "manual memcpy" and modern compilersWilco2017/06/01 01:51 PM
                                    It's all about the length of the memcpy.Travis2017/05/25 05:19 PM
                                      It's all about the length of the memcpy.Michael S2017/05/26 03:07 AM
                                        It's all about the length of the memcpy.Linus Torvalds2017/05/26 02:01 PM
                                      It's all about the length of the memcpy.Linus Torvalds2017/05/26 12:34 PM
                                        It's all about the length of the memcpy.Travis2017/05/26 05:13 PM
                                          It's all about the length of the memcpy.Travis2017/05/26 05:16 PM
                                          It's all about the length of the memcpy.Brett2017/05/26 08:25 PM
                                            It's all about the length of the memcpy.Travis2017/05/27 02:56 PM
                                          It's all about the length of the memcpy.Linus Torvalds2017/05/27 08:50 AM
                                            big.LITTLE ???Michael S2017/05/27 11:09 AM
                                              big.LITTLE ???Linus Torvalds2017/05/27 11:56 AM
                                                may be, Mongoose core ?Michael S2017/05/27 12:43 PM
                                                big.LITTLE ???Travis2017/05/27 03:18 PM
                                                  big.LITTLE ???Linus Torvalds2017/05/28 05:18 PM
                                                    big.LITTLE ???Travis2017/05/28 09:31 PM
                                                    In *theory* this is fixable with better benchmarks ...Mark Roulo2017/05/30 10:22 AM
                                                      In *theory* this is fixable with better benchmarks ...Linus Torvalds2017/05/30 11:12 AM
                                            It's all about the length of the memcpy.Travis2017/05/27 02:49 PM
                                              NT stores are an issueHeikki Kultala2017/05/27 11:25 PM
                                                NT stores are an issueTravis2017/05/28 12:38 AM
                                                  NT stores are an issue (Ryzen result)octoploid2017/05/28 12:57 AM
                                                    NT stores are an issue (Ryzen result)octoploid2017/05/28 12:59 AM
                                                      Bogus extra newline when using code,preoctoploid2017/05/28 01:03 AM
                                                        Bogus extra newline when using code,preMichael S2017/05/28 01:35 AM
                                                    NT stores are an issue (Ryzen result)Travis2017/05/28 01:30 AM
                                                      NT stores are an issue (Ryzen result)Travis2017/05/28 01:35 AM
                                                      NT stores are an issue (Ryzen result)Michael S2017/05/28 01:45 AM
                                                        NT stores are an issue (Ryzen result)Travis2017/05/28 02:20 AM
                                                    NT stores are an issue (Ryzen result)Travis2017/05/28 02:22 AM
                                                      NT stores are an issue (Ryzen result)octoploid2017/05/28 02:30 AM
                                                        NT stores are an issue (Ryzen result)Travis2017/05/28 01:10 PM
                                              It's all about the length of the memcpy.Doug S2017/05/28 08:55 AM
                                      It's all about the length of the memcpy.Gabriele Svelto2017/05/26 03:33 PM
                                        It's all about the length of the memcpy.Travis2017/05/26 06:51 PM
                                          It's all about the length of the memcpy.Seni2017/05/28 03:14 PM
                                            It's all about the length of the memcpy.Travis2017/05/28 03:26 PM
                                              It's all about the length of the memcpy.Gabriele Svelto2017/05/29 05:53 AM
                                                It's all about the length of the memcpy.Travis2017/05/29 02:04 PM
                                                  It's all about the length of the memcpy.Seni2017/05/29 05:06 PM
                                                    It's all about the length of the memcpy.Travis2017/05/29 07:45 PM
                                                      It's all about the length of the memcpy.Brett2017/05/29 09:36 PM
                                                  Real code, real data from a real workloadGabriele Svelto2017/05/30 03:59 PM
                                                    Real code, real data from a real workloadTravis2017/05/30 08:01 PM
                                                      Real code, real data from a real workloadGabriele Svelto2017/05/31 09:31 AM
                                                        Real code, real data from a real workloadgallier22017/05/31 10:02 AM
                                                        Real code, real data from a real workloadSymmetry2017/05/31 10:17 AM
                                                          Real code, real data from a real workloadTravis2017/05/31 06:49 PM
                                                        Real code, real data from a real workloadTravis2017/05/31 06:27 PM
                                                          Real code, real data from a real workloadMichael S2017/06/01 02:38 AM
                                                            Real code, real data from a real workloadWilco2017/06/01 11:06 AM
                                                              fixed indeedMichael S2017/06/01 12:23 PM
                                                          Real code, real data from a real workloadGabriele Svelto2017/06/01 09:44 PM
                                                            Real code, real data from a real workloadTravis2017/06/02 02:38 PM
                                                              Real code, real data from a real workloadmeh2017/06/03 06:22 AM
                                                                Real code, real data from a real workloadTravis2017/06/03 11:50 AM
                                                            Real code, real data from a real workloadSeni2017/06/02 04:34 PM
                                                              Real code, real data from a real workloadBrendan2017/06/02 11:09 PM
                                                                Real code, real data from a real workloadSeni2017/06/03 03:49 AM
                                                                Real code, real data from a real workloadrwessel2017/06/03 11:40 AM
                                                                  Real code, real data from a real workloadTravis2017/06/03 01:40 PM
                                                                Real code, real data from a real workloadTravis2017/06/03 01:20 PM
                                                          Real code, real data from a real workloadRicardo B2017/06/04 02:47 PM
                                                            Real code, real data from a real workloadTravis2017/06/04 05:15 PM
                                                              correctionTravis2017/06/04 05:17 PM
                                                              Real code, real data from a real workloadRicardo B2017/06/04 07:03 PM
                                                                Real code, real data from a real workloadTravis2017/06/06 12:33 PM
                                                            Real code, real data from a real workloadEtienne2017/06/05 03:40 AM
                                It's all about the length of the memcpy.Megol2017/05/25 08:08 AM
                              rep movsb is still slowWilco2017/05/25 03:43 PM
                                4K is not small... (NT)iz2017/05/26 01:10 PM
                                  Random copies are < 256 bytes (NT)Wilco2017/05/26 02:38 PM
                                rep movsb is still slowBrendan2017/05/27 07:50 PM
                                  rep movsb is still slowTravis2017/05/27 09:27 PM
                            Then why use even AVX2 for memcpy?Eric Bron2017/05/24 12:22 AM
                    Is K12 still alive?Ronald Maas2017/05/23 09:27 PM
                      Is K12 still alive?dmcq2017/05/24 03:37 AM
                    Wide registersLaurent2017/05/24 08:53 AM
                      It's called Amdahl's law (NT)Gabriele Svelto2017/05/25 04:09 PM
                      Wide registersMichael S2017/05/26 03:24 AM
                        Wide registersEric Bron2017/05/26 05:47 AM
                          Ivan Godard (NT)Michael S2017/05/27 11:11 AM
                        Wide registersLaurent2017/05/26 08:44 AM
            Is K12 still alive?dmcq2017/05/23 04:47 AM
              Is K12 still alive?juanrga2017/05/23 05:29 AM
              the whole post makes no sense at all (NT)Michael S2017/05/23 06:03 AM
                did you expect different?blue2017/05/23 08:07 AM
                  did you expect different?dmcq2017/05/24 03:35 AM
                    did you expect juanrga post to make sense? (NT) (clarified?)blue2017/05/27 03:44 AM
                      did you follow the discussion?Michael S2017/05/28 01:30 AM
                        did you follow the discussion?dmcq2017/05/28 03:05 AM
                          did you follow the discussion?juanrga2017/05/28 12:24 PM
                          did you follow the discussion?anon.12017/05/28 01:57 PM
                            did you follow the discussion?dmcq2017/05/28 03:18 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?