By: Linus B Torvalds (torvalds.delete@this.linux-foundation.org), May 23, 2017 2:07 pm
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on May 23, 2017 2:35 pm wrote:
>
> Given how large of a footprint memory copies/zeroing/setting has on many profiles, perhaps we need
> special instructions designed for the task. With multiple threads of execution becoming common,
> perhaps having a long (in terms of time for instruction to complete) memory copy/set instruction
> occupying a thread wouldn't be a bad thing, so long as it is interruptible and restartable.
I'm a huge fan of "rep movsb".
It's actually a great interface, in that it's exactly that interruptible and restartable, with minimal (literally: source, destination and count) register state. No stupid big data registers to hold the data. No stupid alignment issues or worries about the cacheline size.
Yes, that silly DF bit is annoying and useless, and writing optimized memory copy code in microcode is not trivial when you have to take all the possible aliasing issues into account. And yes, you technically have stricter semantic guarantees than "memcpy()", since you do have to act as if you are copying one byte at a time in a particular direction, so that does limit you a tiny bit.
So it's not a perfect interface, but it really does get fairly close. Fixed registers might be something that raises peoples hackles, but think of it as just a calling convention, and everybody is ok with that.
I will take a better implementation of "rep movsb" and "rep stosb" any day over some stupid wide vector unit thing. It avoids all the edge cases, and it avoids the whole "save/restore pointless register state" crap too.
And Intel actually does do fairly well. "rep movs" is often - but not always - the best memory copy implementation (with "rep stosb" being often the best memory clearing).
Obviously, if you have a compile-time size constant where the size is small enough, you are much better off implementing memcpy() as a fixed set of loads and stores. That goes without saying. But if you have a variable sized copy, the overhead of doing some microcode for it shouldn't be any worse than the size tests.
Of course, a lot of benchmarks are bad, and assume that (a) I$ doesn't matter (so the benchmark just does a memcpy over and over again and even insane complex routines are basically "free") and (b) often tests particular sizes over and over again so that the size tests look like they are free because they predict well.
So "rep movsb" seldom beats hand-tuned things when that happens. But it actually does fairly well in general, and often is the best choice. But there have historically been a few cases where it falls down badly, so..
Most of the rest of the "rep" instructions aren't nearly as useful. "strnlen()" and friends do show up in profiles and you could imagine a good "rep scasb" too, but it is nowhere near as important as memcpy/memset so it's probably not worth it.
Linus
>
> Given how large of a footprint memory copies/zeroing/setting has on many profiles, perhaps we need
> special instructions designed for the task. With multiple threads of execution becoming common,
> perhaps having a long (in terms of time for instruction to complete) memory copy/set instruction
> occupying a thread wouldn't be a bad thing, so long as it is interruptible and restartable.
I'm a huge fan of "rep movsb".
It's actually a great interface, in that it's exactly that interruptible and restartable, with minimal (literally: source, destination and count) register state. No stupid big data registers to hold the data. No stupid alignment issues or worries about the cacheline size.
Yes, that silly DF bit is annoying and useless, and writing optimized memory copy code in microcode is not trivial when you have to take all the possible aliasing issues into account. And yes, you technically have stricter semantic guarantees than "memcpy()", since you do have to act as if you are copying one byte at a time in a particular direction, so that does limit you a tiny bit.
So it's not a perfect interface, but it really does get fairly close. Fixed registers might be something that raises peoples hackles, but think of it as just a calling convention, and everybody is ok with that.
I will take a better implementation of "rep movsb" and "rep stosb" any day over some stupid wide vector unit thing. It avoids all the edge cases, and it avoids the whole "save/restore pointless register state" crap too.
And Intel actually does do fairly well. "rep movs" is often - but not always - the best memory copy implementation (with "rep stosb" being often the best memory clearing).
Obviously, if you have a compile-time size constant where the size is small enough, you are much better off implementing memcpy() as a fixed set of loads and stores. That goes without saying. But if you have a variable sized copy, the overhead of doing some microcode for it shouldn't be any worse than the size tests.
Of course, a lot of benchmarks are bad, and assume that (a) I$ doesn't matter (so the benchmark just does a memcpy over and over again and even insane complex routines are basically "free") and (b) often tests particular sizes over and over again so that the size tests look like they are free because they predict well.
So "rep movsb" seldom beats hand-tuned things when that happens. But it actually does fairly well in general, and often is the best choice. But there have historically been a few cases where it falls down badly, so..
Most of the rest of the "rep" instructions aren't nearly as useful. "strnlen()" and friends do show up in profiles and you could imagine a good "rep scasb" too, but it is nowhere near as important as memcpy/memset so it's probably not worth it.
Linus
| Topic | Posted By | Date |
|---|---|---|
| Is K12 still alive? | Heikki Kultala | 2017/05/11 09:34 PM |
| It never made sense | Someone | 2017/05/11 11:58 PM |
| It never made sense | juanrga | 2017/05/12 04:02 AM |
| It never made sense | Michael S | 2017/05/12 04:47 AM |
| It never made sense | anon.1 | 2017/05/12 07:19 AM |
| It never made sense | wumpus | 2017/05/12 03:57 PM |
| It never made sense | anon.1 | 2017/05/12 05:37 PM |
| It never made sense | wumpus | 2017/05/13 06:52 AM |
| It never made sense | anon.1 | 2017/05/13 05:29 PM |
| It never made sense | David Kanter | 2017/05/13 11:41 PM |
| It never made sense | juanrga | 2017/05/14 04:23 AM |
| It never made sense | bakaneko | 2017/05/14 04:56 AM |
| It never made sense | anon.1 | 2017/05/14 07:36 AM |
| Hierofalcon ? | Michael S | 2017/05/14 12:15 AM |
| Hierofalcon ? | anyone | 2017/05/15 09:05 AM |
| It never made sense | juanrga | 2017/05/12 06:11 PM |
| It never made sense | anon.1 | 2017/05/13 05:59 AM |
| It never made sense | juanrga | 2017/05/14 03:35 AM |
| It never made sense | anon.1 | 2017/05/14 08:26 AM |
| It never made sense | juanrga | 2017/05/14 03:47 PM |
| It never made sense | anon.1 | 2017/05/14 04:49 PM |
| It never made sense | juanrga | 2017/05/17 04:10 AM |
| It never made sense | anon.1 | 2017/05/18 08:11 AM |
| It never made sense | juanrga | 2017/05/20 02:10 AM |
| It never made sense | anon.1 | 2017/05/20 08:40 AM |
| It never made sense | Brett | 2017/05/20 10:08 AM |
| It never made sense | wumpus | 2017/05/20 11:27 AM |
| It never made sense | Michael S | 2017/05/20 12:49 PM |
| It never made sense | anon.1 | 2017/05/20 03:19 PM |
| It never made sense | Brett | 2017/05/20 04:44 PM |
| It never made sense | anon.1 | 2017/05/20 05:22 PM |
| It never made sense | Brett | 2017/05/20 06:08 PM |
| It never made sense | anon.1 | 2017/05/20 06:35 PM |
| It never made sense | Jouni Osmala | 2017/05/21 07:45 AM |
| It never made sense | Brett | 2017/05/21 11:28 AM |
| It never made sense | Jouni Osmala | 2017/05/22 12:07 AM |
| It never made sense | Michael S | 2017/05/22 12:27 AM |
| It never made sense | Maynard Handley | 2017/05/21 07:09 PM |
| It never made sense | Andreas | 2017/05/23 04:03 AM |
| It never made sense | Maynard Handley | 2017/05/23 08:37 AM |
| It never made sense | Andreas | 2017/05/24 04:11 AM |
| It never made sense | dmcq | 2017/05/20 04:45 PM |
| It never made sense | anon.1 | 2017/05/20 05:24 PM |
| It never made sense | anon.1 | 2017/05/20 06:43 PM |
| It never made sense | dmcq | 2017/05/21 10:34 AM |
| It never made sense | blue | 2017/05/21 12:29 PM |
| It never made sense | blue | 2017/05/21 12:30 PM |
| It never made sense | Maynard Handley | 2017/05/21 07:12 PM |
| To all! Snip your citations. It's annoying as hell asit is!!! (NT) | gallier2 | 2017/05/21 11:48 PM |
| Bogus ICC comparison | Wilco | 2017/05/21 03:06 AM |
| Bogus ICC comparison | anon.1 | 2017/05/21 07:09 AM |
| Bogus ICC comparison | Michael S | 2017/05/21 08:11 AM |
| Bogus ICC comparison | David Kanter | 2017/05/21 11:42 AM |
| Bogus ICC comparison | Anne O'Nonymous | 2017/05/22 03:14 AM |
| Bogus ICC comparison | slacker | 2017/05/22 04:21 AM |
| Bogus ICC comparison | Anne O'Nymous | 2017/05/23 10:26 AM |
| Bogus ICC comparison | dmcq | 2017/05/22 04:55 AM |
| Bogus ICC comparison | anon.1 | 2017/05/22 10:59 AM |
| Bogus ICC comparison | Wilco | 2017/05/22 12:15 PM |
| Bogus ICC comparison | anon.1 | 2017/05/22 10:44 AM |
| Bogus ICC comparison | Wilco | 2017/05/22 11:55 AM |
| Just look at the 403.gcc results | Doug S | 2017/05/21 11:24 AM |
| Just look at the 403.gcc results | Maynard Handley | 2017/05/21 07:17 PM |
| Just look at the 403.gcc results | Doug S | 2017/05/21 09:14 PM |
| Just look at the 403.gcc results | dmcq | 2017/05/22 05:08 AM |
| It never made sense | juanrga | 2017/05/21 04:46 AM |
| It never made sense | anon.1 | 2017/05/21 06:57 AM |
| It never made sense | anon.1 | 2017/05/21 07:32 AM |
| It never made sense | Anne O'Nonymous | 2017/05/22 03:11 AM |
| required PRF size | Heikki Kultala | 2017/05/14 07:59 PM |
| required PRF size | Wilco | 2017/05/15 01:18 AM |
| required PRF size | Michael S | 2017/05/15 02:05 AM |
| required PRF size | anon.1 | 2017/05/15 05:57 AM |
| required PRF size | Wilco | 2017/05/15 01:46 PM |
| required PRF size | anon.1 | 2017/05/15 05:30 PM |
| required PRF size | Wilco | 2017/05/16 02:50 AM |
| required PRF size | Michael S | 2017/05/16 03:23 AM |
| required PRF size | anon.1 | 2017/05/16 05:57 AM |
| required PRF size | Ricardo B | 2017/05/16 08:10 AM |
| required PRF size | anon.1 | 2017/05/16 10:56 AM |
| Thanks! (NT) | Ricardo B | 2017/05/16 02:51 PM |
| required PRF size | Jouni Osmala | 2017/05/16 09:03 PM |
| required PRF size | anon.1 | 2017/05/16 11:04 PM |
| required PRF size | Maynard Handley | 2017/05/16 03:56 PM |
| required PRF size | anon.1 | 2017/05/16 07:21 AM |
| required PRF size | Linus B Torvalds | 2017/05/15 09:11 AM |
| required PRF size | Michael S | 2017/05/15 10:20 AM |
| required PRF size | Linus B Torvalds | 2017/05/15 02:49 PM |
| required PRF size | Jouni Osmala | 2017/05/17 05:04 AM |
| Load-op usage | Wilco | 2017/05/15 03:29 PM |
| Load-op usage | anon5 | 2017/05/15 05:05 PM |
| Load-op usage | Wilco | 2017/05/16 04:15 PM |
| Load-op usage | Michael S | 2017/05/17 12:00 AM |
| Load-op usage | Wilco | 2017/05/17 02:02 AM |
| could it be C vs C++? (NT) | Michael S | 2017/05/17 02:46 AM |
| Load-op usage | Gabriele Svelto | 2017/05/17 04:27 AM |
| Load-op usage | Gian-Carlo Pascutto | 2017/05/17 07:53 AM |
| Use perf top? | Travis | 2017/05/17 12:21 PM |
| Use perf top? | Wilco | 2017/05/17 03:23 PM |
| Use perf top? | Travis | 2017/05/17 05:12 PM |
| Use perf top? | Seni | 2017/05/17 08:13 PM |
| Use perf top? | Wilco | 2017/05/18 02:37 AM |
| Compiled on Skylake? (NT) | Michael S | 2017/05/18 03:16 AM |
| Use perf top? | Gabriele Svelto | 2017/05/18 04:19 AM |
| Use perf top? | octoploid | 2017/05/18 04:48 AM |
| Use perf top? | Gabriele Svelto | 2017/05/18 08:33 AM |
| Use perf top? | octoploid | 2017/05/18 09:51 AM |
| Use perf top? | Gabriele Svelto | 2017/05/18 12:12 PM |
| Use perf top? | octoploid | 2017/05/18 12:29 PM |
| Use perf top? | Gian-Carlo Pascutto | 2017/05/22 07:21 AM |
| Use perf top? | octoploid | 2017/05/22 08:01 AM |
| Use perf top? | Gian-Carlo Pascutto | 2017/05/22 09:21 AM |
| Use perf top? | octoploid | 2017/05/22 09:34 AM |
| Use perf top? | Gian-Carlo Pascutto | 2017/05/22 09:53 AM |
| Use perf top? | octoploid | 2017/05/23 02:54 AM |
| Use perf top? | rwessel | 2017/05/23 07:58 AM |
| Use perf top? | octoploid | 2017/05/23 08:09 AM |
| Use perf top? | Megol | 2017/05/24 04:04 AM |
| Use perf top? | octoploid | 2017/05/24 04:24 AM |
| Use perf top? | Gian-Carlo Pascutto | 2017/05/24 05:53 AM |
| Use perf top? | octoploid | 2017/05/24 06:01 AM |
| Use perf top? | Megol | 2017/05/25 12:24 PM |
| Use perf top? | Wilco | 2017/05/18 02:20 AM |
| Use perf top? | Travis | 2017/05/18 01:24 PM |
| Use perf top? | Wilco | 2017/05/18 03:50 PM |
| Use perf top? | Travis | 2017/05/18 06:34 PM |
| Load-op usage | Michael S | 2017/05/17 12:21 AM |
| Load-op usage | Wilco | 2017/05/17 02:20 AM |
| Load-op usage | Linus B Torvalds | 2017/05/17 08:29 AM |
| Load-op usage | Linus B Torvalds | 2017/05/17 01:45 PM |
| Load-op usage | anon.1 | 2017/05/15 05:36 PM |
| Load-op usage | Michael S | 2017/05/16 12:27 AM |
| Load-op usage | anon.1 | 2017/05/16 06:52 AM |
| Load-op usage | anon.1 | 2017/05/16 06:58 AM |
| Load-op usage | Michael S | 2017/05/16 11:52 PM |
| Load-op usage | anon.1 | 2017/05/17 06:03 AM |
| Load-op usage | Michael S | 2017/05/17 06:24 AM |
| Load-op usage | anon.1 | 2017/05/17 10:53 PM |
| Load-op usage | Michael S | 2017/05/17 11:48 PM |
| Load-op usage | Linus B Torvalds | 2017/05/16 08:01 AM |
| Load-op usage | Linus B Torvalds | 2017/05/16 08:17 AM |
| Load-op usage | _Arthur | 2017/05/17 04:11 PM |
| Load-op usage | Michael S | 2017/05/18 01:50 AM |
| Load-op usage | Linus B Torvalds | 2017/05/18 09:03 AM |
| Load-op usage | octoploid | 2017/05/18 10:45 AM |
| Load-op usage | Linus B Torvalds | 2017/05/18 11:28 AM |
| required PRF size | anon.1 | 2017/05/15 06:44 AM |
| required PRF size | slacker | 2017/05/15 04:20 PM |
| required PRF size | anon.1 | 2017/05/15 06:48 PM |
| required PRF size | slacker | 2017/05/15 08:52 PM |
| Fixed link | slacker | 2017/05/15 08:54 PM |
| required PRF size | anon.1 | 2017/05/16 06:56 AM |
| It never made sense | anon.1 | 2017/05/13 07:03 AM |
| It never made sense | anon.1 | 2017/05/13 07:31 AM |
| It never made sense | nobody in particular | 2017/05/13 08:02 AM |
| It never made sense | Gabriele Svelto | 2017/05/13 08:05 AM |
| It never made sense | anon.1 | 2017/05/13 10:07 AM |
| It never made sense | Aaron Spink | 2017/05/13 04:18 PM |
| It never made sense | David Hess | 2017/05/13 06:28 PM |
| It never made sense | Brett | 2017/05/13 09:25 PM |
| It never made sense | anon.1 | 2017/05/13 10:44 PM |
| It never made sense | Niels Jørgen Kruse | 2017/05/14 01:37 AM |
| It never made sense | anon.1 | 2017/05/14 08:45 AM |
| It never made sense | Niels Jørgen Kruse | 2017/05/14 12:06 PM |
| It never made sense | Maynard Handley | 2017/05/16 03:46 AM |
| It never made sense | Niels Jørgen Kruse | 2017/05/16 09:24 PM |
| It never made sense | juanrga | 2017/05/14 04:02 AM |
| It never made sense | nobody in particular | 2017/05/14 04:31 AM |
| It never made sense | juanrga | 2017/05/14 01:36 PM |
| It never made sense | nobody in particular | 2017/05/14 02:50 PM |
| It never made sense | juanrga | 2017/05/14 04:36 PM |
| You're discussing two dead-in-the-water architectures | default | 2017/05/15 01:52 PM |
| You're discussing two dead-in-the-water architectures | blue | 2017/05/15 06:14 PM |
| You're discussing two dead-in-the-water architectures | juanrga | 2017/05/17 03:52 AM |
| It never made sense | anon.1 | 2017/05/14 07:27 AM |
| It never made sense | Michael S | 2017/05/14 07:54 AM |
| It never made sense | anon.1 | 2017/05/14 08:40 AM |
| It never made sense | juanrga | 2017/05/14 02:09 PM |
| It never made sense | nobody in particular | 2017/05/14 02:51 PM |
| It never made sense | Michael S | 2017/05/14 02:56 PM |
| It never made sense | anon.1 | 2017/05/14 04:54 PM |
| It never made sense | David Hess | 2017/05/14 10:02 AM |
| It never made sense | Brett | 2017/05/14 12:24 PM |
| It never made sense | Michael S | 2017/05/15 03:55 AM |
| It never made sense | Anon | 2017/05/15 03:14 PM |
| It never made sense | Michael S | 2017/05/16 01:21 AM |
| It never made sense | hobel | 2017/05/16 07:42 AM |
| It never made sense | David Hess | 2017/05/15 05:33 AM |
| It never made sense | wumpus | 2017/05/14 02:08 PM |
| It never made sense | David Hess | 2017/05/15 05:23 AM |
| It never made sense | juanrga | 2017/05/14 03:49 AM |
| It never made sense | Aaron Spink | 2017/05/14 03:58 AM |
| It never made sense | Heikki Kultala | 2017/05/12 10:47 AM |
| It never made sense | Aaron Spink | 2017/05/13 04:20 PM |
| It never made sense | Wes Felter | 2017/05/12 12:18 PM |
| It never made sense | anon.1 | 2017/05/12 05:32 PM |
| Is K12 still alive? | juanrga | 2017/05/12 03:49 AM |
| Is K12 still alive? | Heikki Kultala | 2017/05/12 10:31 AM |
| Is K12 still alive? | who me? | 2017/05/17 06:39 PM |
| Is K12 still alive? | juanrga | 2017/05/18 01:44 AM |
| Is K12 still alive? | dmcq | 2017/05/22 05:19 AM |
| Is K12 still alive? | Foo_ | 2017/05/22 06:56 AM |
| Is K12 still alive? | David Kanter | 2017/05/22 01:42 PM |
| Is K12 still alive? | Linus B Torvalds | 2017/05/22 06:45 PM |
| Is K12 still alive? | Michael_S | 2017/05/22 10:34 PM |
| Is K12 still alive? | David Kanter | 2017/05/23 08:17 AM |
| Is K12 still alive? | Linus B Torvalds | 2017/05/23 09:29 AM |
| Is K12 still alive? | octoploid | 2017/05/23 10:25 AM |
| slow AVX-512 memcpy/memset | Eric Bron | 2017/05/23 11:48 AM |
| slow AVX-512 memcpy/memset | Linus B Torvalds | 2017/05/23 12:51 PM |
| slow AVX-512 memcpy/memset | Eric Bron | 2017/05/23 01:05 PM |
| slow AVX-512 memcpy/memset | Linus B Torvalds | 2017/05/23 01:43 PM |
| slow AVX-512 memcpy/memset | Eric Bron | 2017/05/23 01:59 PM |
| KNL code generator vs 2014 | Michael S | 2017/05/23 11:57 PM |
| KNL code generator vs 2014 | Eric Bron | 2017/05/24 03:21 AM |
| KNL code generator vs 2014 | anon.512 | 2017/05/24 03:03 PM |
| KNL code generator vs 2014 | Michael S | 2017/05/25 07:32 AM |
| food for thought | Eric Bron | 2017/05/24 03:57 PM |
| icc 17 on godbolt disagree | Michael S | 2017/05/25 12:45 AM |
| Sorry, I posted SKX code twice | Michael S | 2017/05/25 12:48 AM |
| stall 2 - are KNL VPUs really OoO? | Michael S | 2017/05/25 01:27 AM |
| which version of icc 17 ? (NT) | Eric Bron | 2017/05/25 02:50 AM |
| 17.0.0 | Michael S | 2017/05/25 02:52 AM |
| 17.0.0 | Eric Bron | 2017/05/25 03:13 AM |
| 17.0.0 | Eric Bron | 2017/05/25 03:24 AM |
| 17.0.0 | Michael S | 2017/05/25 04:29 AM |
| 17.0.0 | Eric Bron | 2017/05/25 04:43 AM |
| 17.0.0 | Michael S | 2017/05/25 07:40 AM |
| strange 256-bit code with icc v7.0.4 | Eric Bron | 2017/05/25 09:51 AM |
| 17.0.0 | Eric Bron | 2017/05/25 04:54 AM |
| fixed example | Eric Bron | 2017/05/25 03:57 AM |
| slow AVX-512 memcpy/memset | Travis | 2017/05/23 02:57 PM |
| correction: has NOT been the case | Travis | 2017/05/23 02:58 PM |
| slow AVX-512 memcpy/memset | anon | 2017/05/24 05:00 AM |
| slow AVX-512 memcpy/memset | Travis | 2017/05/24 01:27 PM |
| slow AVX-512 memcpy/memset | anon | 2017/05/25 01:16 AM |
| slow AVX-512 memcpy/memset | Travis | 2017/05/25 04:02 PM |
| slow AVX-512 memcpy/memset | Gabriele Svelto | 2017/05/24 04:12 AM |
| slow AVX-512 memcpy/memset | Doug S | 2017/05/23 01:35 PM |
| slow AVX-512 memcpy/memset | Linus B Torvalds | 2017/05/23 02:07 PM |
| Dedicated mem* instructions | Doug S | 2017/05/23 10:17 PM |
| Dedicated mem* instructions | Linus Torvalds | 2017/05/24 12:21 AM |
| Dedicated mem* instructions | Linus Torvalds | 2017/05/24 07:16 AM |
| Dedicated mem* instructions | anon | 2017/05/24 08:52 AM |
| Dedicated mem* instructions | Linus Torvalds | 2017/05/24 10:31 AM |
| Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? (NT) | TEMLIB | 2017/05/24 11:52 AM |
| asynchronous co-processors are evil (NT) | Michael S | 2017/05/24 11:57 AM |
| Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? | David Hess | 2017/05/24 02:52 PM |
| Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? | Travis | 2017/05/24 02:55 PM |
| Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? | TEMLIB | 2017/05/24 03:29 PM |
| Dedicated mem* instructions | anon | 2017/05/24 07:39 PM |
| AVX-512 and XOP | Yuhong Bao | 2017/05/24 10:19 PM |
| 128-bit vs 256-bit vectors in crypto | Yuhong Bao | 2017/05/31 10:37 AM |
| Dedicated mem* instructions | Doug S | 2017/05/24 11:37 AM |
| Dedicated mem* instructions | Michael S | 2017/05/24 11:55 AM |
| Dedicated mem* instructions | Doug S | 2017/05/24 01:35 PM |
| Dedicated mem* instructions | Linus Torvalds | 2017/05/24 02:41 PM |
| Dedicated mem* instructions | Travis | 2017/05/24 03:20 PM |
| Dedicated mem* instructions | Linus Torvalds | 2017/05/25 09:54 AM |
| Dedicated mem* instructions | Gabriele Svelto | 2017/05/25 03:05 PM |
| Immediate lengths for mem* instructions | Paul A. Clayton | 2017/05/26 03:55 AM |
| slow AVX-512 memcpy/memset | Travis | 2017/05/24 02:41 PM |
| ucode branch prediction | David Kanter | 2017/05/24 04:45 PM |
| Then why use even AVX2 for memcpy? | Mark Roulo | 2017/05/23 03:30 PM |
| Then why use even AVX2 for memcpy? | Linus B Torvalds | 2017/05/23 09:08 PM |
| Danke (NT). | Mark Roulo | 2017/05/24 10:52 AM |
| It's all about the length of the memcpy. | Heikki Kultala | 2017/05/23 09:18 PM |
| It's all about the length of the memcpy. | Heikki Kultala | 2017/05/23 09:26 PM |
| It's all about the length of the memcpy. | Yoav | 2017/05/24 12:08 AM |
| It's all about the length of the memcpy. | Michael S | 2017/05/24 12:37 AM |
| It's all about the length of the memcpy. | Megol | 2017/05/24 02:39 AM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/24 04:17 AM |
| It's all about the length of the memcpy. | Travis | 2017/05/24 01:46 PM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/25 03:24 AM |
| It's all about the length of the memcpy. | octoploid | 2017/05/25 03:45 AM |
| Forgot , but you get the idea (NT) | octoploid | 2017/05/25 04:12 AM |
| Forgot to add a pre tag but you get the idea (NT) | octoploid | 2017/05/25 04:14 AM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/25 02:37 PM |
| It's all about the length of the memcpy. | Wilco | 2017/05/25 02:48 PM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/25 03:07 PM |
| It's all about the length of the memcpy. | Wilco | 2017/05/26 01:47 AM |
| "manual memcpy" and modern compilers | Heikki Kultala | 2017/05/27 10:27 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/05/29 07:30 PM |
| "manual memcpy" and modern compilers | Travis | 2017/05/29 08:32 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/05/30 09:54 AM |
| "manual memcpy" and modern compilers | Jason Creighton | 2017/05/30 11:33 AM |
| "manual memcpy" and modern compilers | Wilco | 2017/05/30 07:29 PM |
| "manual memcpy" and modern compilers | Travis | 2017/05/30 07:23 PM |
| "manual memcpy" and modern compilers | Wilco | 2017/05/30 07:34 PM |
| "manual memcpy" and modern compilers | octoploid | 2017/05/30 08:46 PM |
| "manual memcpy" and modern compilers | Wilco | 2017/05/31 01:28 AM |
| "manual memcpy" and modern compilers | octoploid | 2017/05/31 02:14 AM |
| "manual memcpy" and modern compilers | Wilco | 2017/05/31 01:42 PM |
| "manual memcpy" and modern compilers | Travis | 2017/05/31 05:40 PM |
| "manual memcpy" and modern compilers | Jouni Osmala | 2017/05/31 10:42 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/01 09:39 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/01 03:30 PM |
| "manual memcpy" and modern compilers | octoploid | 2017/06/02 12:26 AM |
| "manual memcpy" and modern compilers | octoploid | 2017/06/02 12:27 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 11:18 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 11:40 AM |
| "manual memcpy" and modern compilers | octoploid | 2017/06/02 02:29 AM |
| "manual memcpy" and modern compilers | GiGNiC | 2017/06/02 04:23 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 06:56 PM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 01:05 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/02 02:48 PM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 03:50 PM |
| "manual memcpy" and modern compilers | giovanni deretta | 2017/06/03 12:43 PM |
| "manual memcpy" and modern compilers | David Kanter | 2017/06/04 09:04 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/04 12:53 PM |
| "manual memcpy" and modern compilers | David Kanter | 2017/06/04 08:03 PM |
| memory renaming | Travis | 2017/06/06 10:52 AM |
| memory renaming | anon.1 | 2017/06/07 07:06 PM |
| memory renaming | anon.1 | 2017/06/07 07:54 PM |
| "manual memcpy" and modern compilers | Travis | 2017/06/02 07:21 PM |
| "manual memcpy" and modern compilers | octoploid | 2017/06/02 08:31 PM |
| "manual memcpy" and modern compilers | octoploid | 2017/06/03 01:19 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/03 10:38 AM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/04 09:57 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/04 01:11 PM |
| "manual memcpy" and modern compilers | Michael S | 2017/06/05 03:47 AM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/02 08:21 AM |
| "manual memcpy" and modern compilers | Yuhong Bao | 2017/06/02 05:02 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/02 09:27 PM |
| "manual memcpy" and modern compilers | Yuhong Bao | 2017/06/03 09:26 PM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/04 10:12 AM |
| "manual memcpy" and modern compilers | giovanni deretta | 2017/06/05 12:22 AM |
| "manual memcpy" and modern compilers | Linus Torvalds | 2017/06/05 08:49 AM |
| "manual memcpy" and modern compilers | Brett | 2017/05/30 09:07 PM |
| "manual memcpy" and modern compilers | Wilco | 2017/05/31 01:37 AM |
| "manual memcpy" and modern compilers | Brett | 2017/05/31 09:28 PM |
| "manual memcpy" and modern compilers | Travis | 2017/05/31 05:29 PM |
| "manual memcpy" and modern compilers | Travis | 2017/05/31 05:30 PM |
| "manual memcpy" and modern compilers | Wilco | 2017/06/01 01:06 AM |
| "manual memcpy" and modern compilers | Travis | 2017/06/01 11:32 AM |
| "manual memcpy" and modern compilers | Wilco | 2017/06/01 12:51 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/25 04:19 PM |
| It's all about the length of the memcpy. | Michael S | 2017/05/26 02:07 AM |
| It's all about the length of the memcpy. | Linus Torvalds | 2017/05/26 01:01 PM |
| It's all about the length of the memcpy. | Linus Torvalds | 2017/05/26 11:34 AM |
| It's all about the length of the memcpy. | Travis | 2017/05/26 04:13 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/26 04:16 PM |
| It's all about the length of the memcpy. | Brett | 2017/05/26 07:25 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/27 01:56 PM |
| It's all about the length of the memcpy. | Linus Torvalds | 2017/05/27 07:50 AM |
| big.LITTLE ??? | Michael S | 2017/05/27 10:09 AM |
| big.LITTLE ??? | Linus Torvalds | 2017/05/27 10:56 AM |
| may be, Mongoose core ? | Michael S | 2017/05/27 11:43 AM |
| big.LITTLE ??? | Travis | 2017/05/27 02:18 PM |
| big.LITTLE ??? | Linus Torvalds | 2017/05/28 04:18 PM |
| big.LITTLE ??? | Travis | 2017/05/28 08:31 PM |
| In *theory* this is fixable with better benchmarks ... | Mark Roulo | 2017/05/30 09:22 AM |
| In *theory* this is fixable with better benchmarks ... | Linus Torvalds | 2017/05/30 10:12 AM |
| It's all about the length of the memcpy. | Travis | 2017/05/27 01:49 PM |
| NT stores are an issue | Heikki Kultala | 2017/05/27 10:25 PM |
| NT stores are an issue | Travis | 2017/05/27 11:38 PM |
| NT stores are an issue (Ryzen result) | octoploid | 2017/05/27 11:57 PM |
| NT stores are an issue (Ryzen result) | octoploid | 2017/05/27 11:59 PM |
| Bogus extra newline when using code,pre | octoploid | 2017/05/28 12:03 AM |
| Bogus extra newline when using code,pre | Michael S | 2017/05/28 12:35 AM |
| NT stores are an issue (Ryzen result) | Travis | 2017/05/28 12:30 AM |
| NT stores are an issue (Ryzen result) | Travis | 2017/05/28 12:35 AM |
| NT stores are an issue (Ryzen result) | Michael S | 2017/05/28 12:45 AM |
| NT stores are an issue (Ryzen result) | Travis | 2017/05/28 01:20 AM |
| NT stores are an issue (Ryzen result) | Travis | 2017/05/28 01:22 AM |
| NT stores are an issue (Ryzen result) | octoploid | 2017/05/28 01:30 AM |
| NT stores are an issue (Ryzen result) | Travis | 2017/05/28 12:10 PM |
| It's all about the length of the memcpy. | Doug S | 2017/05/28 07:55 AM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/26 02:33 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/26 05:51 PM |
| It's all about the length of the memcpy. | Seni | 2017/05/28 02:14 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/28 02:26 PM |
| It's all about the length of the memcpy. | Gabriele Svelto | 2017/05/29 04:53 AM |
| It's all about the length of the memcpy. | Travis | 2017/05/29 01:04 PM |
| It's all about the length of the memcpy. | Seni | 2017/05/29 04:06 PM |
| It's all about the length of the memcpy. | Travis | 2017/05/29 06:45 PM |
| It's all about the length of the memcpy. | Brett | 2017/05/29 08:36 PM |
| Real code, real data from a real workload | Gabriele Svelto | 2017/05/30 02:59 PM |
| Real code, real data from a real workload | Travis | 2017/05/30 07:01 PM |
| Real code, real data from a real workload | Gabriele Svelto | 2017/05/31 08:31 AM |
| Real code, real data from a real workload | gallier2 | 2017/05/31 09:02 AM |
| Real code, real data from a real workload | Symmetry | 2017/05/31 09:17 AM |
| Real code, real data from a real workload | Travis | 2017/05/31 05:49 PM |
| Real code, real data from a real workload | Travis | 2017/05/31 05:27 PM |
| Real code, real data from a real workload | Michael S | 2017/06/01 01:38 AM |
| Real code, real data from a real workload | Wilco | 2017/06/01 10:06 AM |
| fixed indeed | Michael S | 2017/06/01 11:23 AM |
| Real code, real data from a real workload | Gabriele Svelto | 2017/06/01 08:44 PM |
| Real code, real data from a real workload | Travis | 2017/06/02 01:38 PM |
| Real code, real data from a real workload | meh | 2017/06/03 05:22 AM |
| Real code, real data from a real workload | Travis | 2017/06/03 10:50 AM |
| Real code, real data from a real workload | Seni | 2017/06/02 03:34 PM |
| Real code, real data from a real workload | Brendan | 2017/06/02 10:09 PM |
| Real code, real data from a real workload | Seni | 2017/06/03 02:49 AM |
| Real code, real data from a real workload | rwessel | 2017/06/03 10:40 AM |
| Real code, real data from a real workload | Travis | 2017/06/03 12:40 PM |
| Real code, real data from a real workload | Travis | 2017/06/03 12:20 PM |
| Real code, real data from a real workload | Ricardo B | 2017/06/04 01:47 PM |
| Real code, real data from a real workload | Travis | 2017/06/04 04:15 PM |
| correction | Travis | 2017/06/04 04:17 PM |
| Real code, real data from a real workload | Ricardo B | 2017/06/04 06:03 PM |
| Real code, real data from a real workload | Travis | 2017/06/06 11:33 AM |
| Real code, real data from a real workload | Etienne | 2017/06/05 02:40 AM |
| It's all about the length of the memcpy. | Megol | 2017/05/25 07:08 AM |
| rep movsb is still slow | Wilco | 2017/05/25 02:43 PM |
| 4K is not small... (NT) | iz | 2017/05/26 12:10 PM |
| Random copies are < 256 bytes (NT) | Wilco | 2017/05/26 01:38 PM |
| rep movsb is still slow | Brendan | 2017/05/27 06:50 PM |
| rep movsb is still slow | Travis | 2017/05/27 08:27 PM |
| Then why use even AVX2 for memcpy? | Eric Bron | 2017/05/23 11:22 PM |
| Is K12 still alive? | Ronald Maas | 2017/05/23 08:27 PM |
| Is K12 still alive? | dmcq | 2017/05/24 02:37 AM |
| Wide registers | Laurent | 2017/05/24 07:53 AM |
| It's called Amdahl's law (NT) | Gabriele Svelto | 2017/05/25 03:09 PM |
| Wide registers | Michael S | 2017/05/26 02:24 AM |
| Wide registers | Eric Bron | 2017/05/26 04:47 AM |
| Ivan Godard (NT) | Michael S | 2017/05/27 10:11 AM |
| Wide registers | Laurent | 2017/05/26 07:44 AM |
| Is K12 still alive? | dmcq | 2017/05/23 03:47 AM |
| Is K12 still alive? | juanrga | 2017/05/23 04:29 AM |
| the whole post makes no sense at all (NT) | Michael S | 2017/05/23 05:03 AM |
| did you expect different? | blue | 2017/05/23 07:07 AM |
| did you expect different? | dmcq | 2017/05/24 02:35 AM |
| did you expect juanrga post to make sense? (NT) (clarified?) | blue | 2017/05/27 02:44 AM |
| did you follow the discussion? | Michael S | 2017/05/28 12:30 AM |
| did you follow the discussion? | dmcq | 2017/05/28 02:05 AM |
| did you follow the discussion? | juanrga | 2017/05/28 11:24 AM |
| did you follow the discussion? | anon.1 | 2017/05/28 12:57 PM |
| did you follow the discussion? | dmcq | 2017/05/28 02:18 PM |


