By: Mark Roulo (nothanks.delete@this.xxx.com), June 7, 2022 6:40 am
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on June 6, 2022 2:59 pm wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on June 6, 2022 2:10 pm wrote:
> > rwessel (rwessel.delete@this.yahoo.com) on June 6, 2022 11:57 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on June 6, 2022 10:44 am wrote:
> > > > rwessel (rwessel.delete@this.yahoo.com) on June 6, 2022 5:26 am wrote:
> > > > > Peter Lewis (peter.delete@this.notyahoo.com) on June 6, 2022 3:55 am wrote:
> > > > > > > There's PGO of course, but that really only works for interpreted or jitted languages.
> > > > > >
> > > > > > The Intel C/C++ Compiler has Profile-Guided Optimization (PGO).
> > > > > > Have you had some bad experience with PGO for compiled languages?
> > > > >
> > > > >
> > > > > The fundamental problem with PGO is that it's (probably fundamentally) too hard to use the
> > > > > vast majority of the time, at least with languages compiled in the traditional way.
> > > > >
> > > > > Unless you have simple to generate (and maintain!*) training datasets, have a very small area
> > > > > where you can separately apply PGO (and can make the training sets small enough to make them
> > > > > maintainable), or you can afford to invest in a quite large infrastructure to use and maintain
> > > > > PGO (because, say, you have an absurd number of machine on which you're going to run this code
> > > > > - consider Google), PGO is just too hard to use, and so is useless 99% of the time.
> > > > >
> > > > > As Peter pointed out, JIT'd languages can take advantage of PGO as well.
> > > > >
> > > > > *Code with limited lifespan, or code you somehow know isn't going to
> > > > > change in the future, can reduce the maintenance requirements here.
> > > >
> > > >
> > > > Has anyone ever tried having the CPU's branch predictor collecting info
> > > > the OS can use to 'update' binaries with branch prediction info?
> > > >
> > > > Having data that is generated by the end user tweak their binaries' branch prediction seems like a
> > > > better solution than having the developer try to come up with training data for the PGO phase. I know
> > > > PGO does more than just predict branches, but in the case just limiting it to the topic at hand.
> > > >
> > > > I'm not sure how it would be implemented - you probably wouldn't actually update the binary
> > > > itself (it may be on read only storage or shared by others) so there would need to be some
> > > > sort of auxiliary data file (maybe stored somewhere like /var/lib or in the user's home directory
> > > > / profile?) that would be used to tweak things when the executable is loaded.
> > > >
> > > > Just an idle thought here, I haven't really considered it more than the few minutes than
> > > > it took to write this post so I could be missing some really big gotchas with this idea!
> > >
> > >
> > > Not quite the same thing, but efforts have been made to save JIT'd code for use the next time.
> >
> > Example here: https://docs.oracle.com/cd/E13188_01/jrockit/docs142/userguide/codecach.html
> >
> > Java 1.4 is quite old and I don't know if they still do this. I remember
> > reading at the time that the overhead of *managing* the cache was slower than
> > just re-JITing. But maybe I remember wrong or maybe things got better.
>
>
> I could certainly see that being the case if the JIT-ter limited itself to relatively small
> chucks of code at a time. OTOH, I can image a system where rather large pieces of code
> are profiled and rebuilt, and where the compilation overhead would be substantial.
Going from memory I think the problem was that verifying that the JIT'd object code was okay to use for the current run took about as long (or maybe more time) as just re-JITing.
This might have been an issue specific to Java where each class gets its own file. Or because there weren't any "-O3" type optimizations performed so the JITing was fairly fast.
I can imagine a scheme where the unit of verification was the JAR file and the verification was idiot simple:
In this case, normal use might just turn into a handful of checksum calculations and verifications followed by running object code. Which was probably NOT what JRockit did.
The *other* issue that Java could run into is that good JITs would do some code path specific optimization that would be backed out if things changed. A good example of this is that JVMs could (and would) inline virtual functions if there was only one implementation of that virtual function. If a second implementation got loaded (with a new class) then the JIT would un-do the inlining. This could chain, of course ...
Depending on how your Java was written, the Java classes might be loaded based on names in an 'ini' file so some of this verification could be tricky.
Similar things could happen for global constants so a branch on a global constant could be eliminated because the JVM knew the value of the constant. Ahead-of-time compilation does this optimization, too, but the same 'issue' can arise: If this constant changes then LOTS of object code needs to be regenerated. Same for the JIT here.
> Mark Roulo (nothanks.delete@this.xxx.com) on June 6, 2022 2:10 pm wrote:
> > rwessel (rwessel.delete@this.yahoo.com) on June 6, 2022 11:57 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on June 6, 2022 10:44 am wrote:
> > > > rwessel (rwessel.delete@this.yahoo.com) on June 6, 2022 5:26 am wrote:
> > > > > Peter Lewis (peter.delete@this.notyahoo.com) on June 6, 2022 3:55 am wrote:
> > > > > > > There's PGO of course, but that really only works for interpreted or jitted languages.
> > > > > >
> > > > > > The Intel C/C++ Compiler has Profile-Guided Optimization (PGO).
> > > > > > Have you had some bad experience with PGO for compiled languages?
> > > > >
> > > > >
> > > > > The fundamental problem with PGO is that it's (probably fundamentally) too hard to use the
> > > > > vast majority of the time, at least with languages compiled in the traditional way.
> > > > >
> > > > > Unless you have simple to generate (and maintain!*) training datasets, have a very small area
> > > > > where you can separately apply PGO (and can make the training sets small enough to make them
> > > > > maintainable), or you can afford to invest in a quite large infrastructure to use and maintain
> > > > > PGO (because, say, you have an absurd number of machine on which you're going to run this code
> > > > > - consider Google), PGO is just too hard to use, and so is useless 99% of the time.
> > > > >
> > > > > As Peter pointed out, JIT'd languages can take advantage of PGO as well.
> > > > >
> > > > > *Code with limited lifespan, or code you somehow know isn't going to
> > > > > change in the future, can reduce the maintenance requirements here.
> > > >
> > > >
> > > > Has anyone ever tried having the CPU's branch predictor collecting info
> > > > the OS can use to 'update' binaries with branch prediction info?
> > > >
> > > > Having data that is generated by the end user tweak their binaries' branch prediction seems like a
> > > > better solution than having the developer try to come up with training data for the PGO phase. I know
> > > > PGO does more than just predict branches, but in the case just limiting it to the topic at hand.
> > > >
> > > > I'm not sure how it would be implemented - you probably wouldn't actually update the binary
> > > > itself (it may be on read only storage or shared by others) so there would need to be some
> > > > sort of auxiliary data file (maybe stored somewhere like /var/lib or in the user's home directory
> > > > / profile?) that would be used to tweak things when the executable is loaded.
> > > >
> > > > Just an idle thought here, I haven't really considered it more than the few minutes than
> > > > it took to write this post so I could be missing some really big gotchas with this idea!
> > >
> > >
> > > Not quite the same thing, but efforts have been made to save JIT'd code for use the next time.
> >
> > Example here: https://docs.oracle.com/cd/E13188_01/jrockit/docs142/userguide/codecach.html
> >
> > Java 1.4 is quite old and I don't know if they still do this. I remember
> > reading at the time that the overhead of *managing* the cache was slower than
> > just re-JITing. But maybe I remember wrong or maybe things got better.
>
>
> I could certainly see that being the case if the JIT-ter limited itself to relatively small
> chucks of code at a time. OTOH, I can image a system where rather large pieces of code
> are profiled and rebuilt, and where the compilation overhead would be substantial.
Going from memory I think the problem was that verifying that the JIT'd object code was okay to use for the current run took about as long (or maybe more time) as just re-JITing.
This might have been an issue specific to Java where each class gets its own file. Or because there weren't any "-O3" type optimizations performed so the JITing was fairly fast.
I can imagine a scheme where the unit of verification was the JAR file and the verification was idiot simple:
"This entire block of JIT'd object code is valid only if this set of JARs match these hash values"
In this case, normal use might just turn into a handful of checksum calculations and verifications followed by running object code. Which was probably NOT what JRockit did.
The *other* issue that Java could run into is that good JITs would do some code path specific optimization that would be backed out if things changed. A good example of this is that JVMs could (and would) inline virtual functions if there was only one implementation of that virtual function. If a second implementation got loaded (with a new class) then the JIT would un-do the inlining. This could chain, of course ...
Depending on how your Java was written, the Java classes might be loaded based on names in an 'ini' file so some of this verification could be tricky.
Similar things could happen for global constants so a branch on a global constant could be eliminated because the JVM knew the value of the constant. Ahead-of-time compilation does this optimization, too, but the same 'issue' can arise: If this constant changes then LOTS of object code needs to be regenerated. Same for the JIT here.