Interesting comment about rep instructions & code size

By: Chester (lamchester.delete@this.gmail.com), January 15, 2020 7:24 pm
Room: Moderated Discussions
> > Maybe the cost of saving icache misses and branch mispredicts would be worth it?
>
> It could be, for some stuff (especially at Google where they are famous for jumping
> through hoops to reduce icache misses due to relatively large code sizes).
>
> I will never always be a good tradeoff, i.e,. regardless of your
> icache pressure because the small size behavior is too poor.
>
> Note also that there some really rediculous memcpy and memcmp (especially) implementations,
> like thousands of instructions, so if you're comparing it to that, then yeah - but if
> you compare it to a good size-and-perforamnce-optimized implementation, which would
> be maybe one or two dozen instructions for the small cases them maybe not.

Small size behavior's gonna be bad anyway if a call to memcpy incurs an icache miss and maybe a couple branch mispredicts afterward.

Would you happen to know the rep movsb startup time on modern architectures like Skylake?

> > > I find this quote a bit enigmatic: so are they leveraging the rep move instructions,
> > > in which case the implementation is very simple, or are they doing what they said elsewhere,
> > > which is using compile-time selected instructions to implement a compact memcopy.
> > >
> > > I am surprised they said PLT-based dispatch isn't efficient: as far as I can tell, it's basically zero
> > > cost if you were calling the memcpy function: either way you are making a call through the PLT, so what
> > > downside is there to having the machine-appropriate entry be selected at dynamic load time?
> >
> > What's PLT-based dispatch? I googled and couldn't find anything on it.
>
> This.
>
> Basically dynamically linked symbol loading happens at runtime anyways, through a layer of indirection,
> so if you make the symbol look up arch-aware you basically arch-aware dispatch for free (again under
> the assumption you were going to make the function call through the PLT in the first place).
>

Oh interesting. Still it wouldn't handle optimizing for small vs large memcpys, because the same call could alternate between tiny and large copies.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?