Interesting comment about rep instructions & code size

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), January 16, 2020 10:12 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on January 16, 2020 2:56 am wrote:
>
> In Firefox we've been using rep movs for memory copies of unknown length for ages - way before
> they became fast - the code size, I-cache and branch-predictor pollution wins are too large to
> ignore.

So just out of curiosity, exactly how do you use it?

Because if you use -Os with gcc, one of the problems we had in the kernel was that gcc would do even fixed-size memcpy() calls with "rep movsb", and that was too horrendous for words.

In particular, it would cause things like struct assignments - if the structs were just the right size - to be done that way too, and it was a very noticeable problem in a couple of hot paths. It was particularly noticeable when in that situation you also often read (or modify) one of the words at the same time anyway, which works very naturally with the "move by hand" approach, but just means extra work if you turned it into an actual blind copy.

The cut-off for gcc was something like "five direct move instructions" (this is from memory, so the details might be wrong), which meant that on smallish - but not tiny - structure copies with a few pointers or whatnot, you ended up with gcc using rep movs.

Without -Os, I think the cut-off was more like 128 bytes or something. Which is a lot better (although then gcc starts generating some really odd code for __builtin_memcpy() with a mix of direct stores and 'rep movsq' - but once you go past the "a couple of cachelines" size instruction choice really doesn't tend to matter as much any more)

Which was definitely a huge loss. Yes, the direct copies were larger, but because it was a constant size, there were no mispredict issues or "jump out-of-line" I$ behavior issues with the direct copies. So rep movs lost by a mile, and it really stuck out like a sore thumb on profiles.

We do use "rep movs" for the actual memcpy() routine, and that has never been much of a problem. But at that point we've already done the out-of-line call, and gcc considers all the other registers clobbered etc, so it doesn't get the inlining advantage of using rep movs directly. But we couldn't do that anyway, since we only do it for the "known to not suck horribly" case (ie ERMS is set).

(We ditched -Os for other reasons too - it would make gcc use divide instructions instead of the multiply-by-reciprocal tricks etc, so it just became too much of a liability. Small code is good, but dense code is not really all that bad, so making those kinds of size optimizations ended up being quite painful).

I assume you might use the profile-directed feedback that you use for linking for this too?

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?