Interesting comment about rep instructions & code size

By: Travis Downs (travis.downs.delete@this.gmail.com), January 15, 2020 5:50 pm
Room: Moderated Discussions
Chester (lamchester.delete@this.gmail.com) on January 15, 2020 4:16 pm wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on January 15, 2020 2:40 pm wrote:
> > Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on January 15, 2020 6:12 am wrote:
> > >
Rather than tune for individual microarchitecture variations, we would prefer to leverage on fast string
> > > operations provided by the ISA (for example “rep movsb” on x86). This allows us to leverage wider
> > > data paths over time, without having to build custom dispatch logic which carries its own overheads.
> > >
> > >
And also
> > >
> > >
Small Code Footprint: Our implementations consist of concise patterns for working with chunks
> > > of data. While further specialization can produce better results on microbenchmarks, we did
> > > not see these wins materialize on macrobenchmarks measuring application productivity.
> > >
> > >

> >
> > Well they are going to be disappointed by the performance of rep movsb even
> > on recent hardward, for the very small copies they are say are important.
>
> Maybe the cost of saving icache misses and branch mispredicts would be worth it?

It could be, for some stuff (especially at Google where they are famous for jumping through hoops to reduce icache misses due to relatively large code sizes).

I will never always be a good tradeoff, i.e,. regardless of your icache pressure because the small size behavior is too poor.

Note also that there some really rediculous memcpy and memcmp (especially) implementations, like thousands of instructions, so if you're comparing it to that, then yeah - but if you compare it to a good size-and-perforamnce-optimized implementation, which would be maybe one or two dozen instructions for the small cases them maybe not.

>
> >
> > I find this quote a bit enigmatic: so are they leveraging the rep move instructions,
> > in which case the implementation is very simple, or are they doing what they said elsewhere,
> > which is using compile-time selected instructions to implement a compact memcopy.
> >
> > I am surprised they said PLT-based dispatch isn't efficient: as far as I can tell, it's basically zero
> > cost if you were calling the memcpy function: either way you are making a call through the PLT, so what
> > downside is there to having the machine-appropriate entry be selected at dynamic load time?
>
> What's PLT-based dispatch? I googled and couldn't find anything on it.

This.

Basically dynamically linked symbol loading happens at runtime anyways, through a layer of indirection, so if you make the symbol look up arch-aware you basically arch-aware dispatch for free (again under the assumption you were going to make the function call through the PLT in the first place).

>
> > Maybe they mean they plan to inline in more cases versus making a call.
> >
> > This will pessimize bit copies, so it might be worth having a separate memcpy_big
> > for when you know its big, that uses the widest instructions, etc.
> >
> > They should give their size distribution stats also in a "byte weighted" or
> > "time weighted" format: maybe 99% of your memcpy *calls* are small, but if
> > the big ones are big enough you could spend a lot of total time in them.
>
>

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?