By: Jukka Larja (roskakori2006.delete@this.gmail.com), May 25, 2022 8:50 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on May 25, 2022 6:16 pm wrote:
> Hi,
>
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 25, 2022 6:24 am wrote:
> > Brendan (btrotter.delete@this.gmail.com) on May 24, 2022 5:09 pm wrote:
> >
> > > a) even if ISAs are exactly the same there could be up to 10% performance/efficiency improvement because
> > > lots of optimizations (instruction selection and scheduling, which instructions a fused or not, prefetch
> > > scheduling distance, whether branch prediction has aliasing issues with "too many branches too close",
> > > which cache size for cache blocking optimizations, ...) depend on micro-arch (and P cores and E cores use
> > > very different micro-arch, and ARM's "big" cores and "little" cores use very different micro-arch)
> >
> > How much CPU model optimized code do you think is running on PCs? I tried to find such parameters
> > for Visual Studio, but failed. Doesn't seem to be something developers often do.
>
> Honestly; with the growing number of software developers sacrificing the quality of their end product to reduce
> development time I'd guess that the amount optimized code running on PCs is about 5% of what it should be.
>
> A large part of that is due to the majority of software not being performance critical anyway.
>
> Another part of it is that the tools we use are shit. Unless you're using an "install from source" distro (e.g.
> Gentoo) or writing embedded software (where you know the exact target in advance), you're stuck with native
> binaries where the only option is run-time dispatch; and run-time dispatch isn't supported well by any compiler,
> so you end up with a painful pile of hacky nonsense to achieve something your tools don't support.
>
> Ironically; this is also half of the reason why JIT (which seems like it should suck badly) is able
> to get within 90% of the performance of native code - native code simply sucks so badly that JIT
> (which can optimize for the actual target CPU a little) doesn't seem awful in comparison.
>
> Ideally people would install (sanity checked and pre-optimized) portable byte-code; and the OS would
> compile it into native code to suit the computer (not just CPU - things like RAM speed can matter
> too), including "whole program optimization" (where shared libraries are statically linked); and
> the OS would automatically re-compile the (cached) native code from the original byte-code when
> necessary (including when shared libraries are updated, or byte-code compiler is updated).
>
> Of course this is mostly orthogonal to (and a distraction
> from) the "for or against homogenous CPU support" debate.
I don't think it's a distraction in a sense that it is lower hanging fruit that isn't being picked. You are advocating for OS to support something that people don't need, and could get easier if necessary.
-JLarja
> Hi,
>
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 25, 2022 6:24 am wrote:
> > Brendan (btrotter.delete@this.gmail.com) on May 24, 2022 5:09 pm wrote:
> >
> > > a) even if ISAs are exactly the same there could be up to 10% performance/efficiency improvement because
> > > lots of optimizations (instruction selection and scheduling, which instructions a fused or not, prefetch
> > > scheduling distance, whether branch prediction has aliasing issues with "too many branches too close",
> > > which cache size for cache blocking optimizations, ...) depend on micro-arch (and P cores and E cores use
> > > very different micro-arch, and ARM's "big" cores and "little" cores use very different micro-arch)
> >
> > How much CPU model optimized code do you think is running on PCs? I tried to find such parameters
> > for Visual Studio, but failed. Doesn't seem to be something developers often do.
>
> Honestly; with the growing number of software developers sacrificing the quality of their end product to reduce
> development time I'd guess that the amount optimized code running on PCs is about 5% of what it should be.
>
> A large part of that is due to the majority of software not being performance critical anyway.
>
> Another part of it is that the tools we use are shit. Unless you're using an "install from source" distro (e.g.
> Gentoo) or writing embedded software (where you know the exact target in advance), you're stuck with native
> binaries where the only option is run-time dispatch; and run-time dispatch isn't supported well by any compiler,
> so you end up with a painful pile of hacky nonsense to achieve something your tools don't support.
>
> Ironically; this is also half of the reason why JIT (which seems like it should suck badly) is able
> to get within 90% of the performance of native code - native code simply sucks so badly that JIT
> (which can optimize for the actual target CPU a little) doesn't seem awful in comparison.
>
> Ideally people would install (sanity checked and pre-optimized) portable byte-code; and the OS would
> compile it into native code to suit the computer (not just CPU - things like RAM speed can matter
> too), including "whole program optimization" (where shared libraries are statically linked); and
> the OS would automatically re-compile the (cached) native code from the original byte-code when
> necessary (including when shared libraries are updated, or byte-code compiler is updated).
>
> Of course this is mostly orthogonal to (and a distraction
> from) the "for or against homogenous CPU support" debate.
I don't think it's a distraction in a sense that it is lower hanging fruit that isn't being picked. You are advocating for OS to support something that people don't need, and could get easier if necessary.
-JLarja