By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), July 17, 2013 3:46 pm
Room: Moderated Discussions
bakaneko (nyan.delete@this.hyan.wan) on July 16, 2013 4:55 am wrote:
> ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on July 16, 2013 3:16 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2013 8:45 pm wrote:
> > > bakaneko (nyan.delete@this.hyan.wan) on July 15, 2013 7:47 pm wrote:
> > > >
> > > > As someone who works for ARM as compiler writer,
> > > > why don't you tell us in more detail how Intel
> > > > cheated?
> > >
> > > Exophase's post to anandtech was quoted here earlier, I think. It has the relevant details:
> > >
> > > http://forums.anandtech.com/showthread.php?t=2330027
> > >
> > > and quite frankly, while optimizing multiple bit operations into a word is a very
> > > valid optimization, the code icc generates there seems a fair bit past that.
> > >
> > > Sure, it could in theory happen with a really smart compiler and lots of generic optimizations.
> > > In practice? It really smells like the compiler actively targeting a very particular code-sequence.
> > > IOW, compiler cheating. The timing that Exophase points out makes it look worse.
> > >
> > > And Wilco is right that it smells pretty bad when AnTuTu seems to be so close to
> > > intel, and seem to have bent over backwards using recent versions of icc etc.
> > >
> > > It's all "explainable". But it doesn't pass the smell test.
> > >
> > > Linus
> >
> > The optimization is clearly doable by a machine. In my opinion this means there is no reason to criticize
> > the compiler or the compiler team for adding the optimization to the compiler. The blame should go towards
> > people who published the benchmark results without making it clear that the results are a mixture of
> > raw CPU performance and compiler optimizations in the context of a particular benchmark.
> >
> > It isn't compiler cheating. It is misattribution of benchmark results. The improved benchmark numbers
> > should have been attributed to both the CPU and the compiler, rather than just to the CPU alone.
> >
> > I think it would be best for benchmarks to take into account
> > the number of executed instructions. Seeing the
> > numbers of executed instructions when comparing benchmarks
> > would make it easier to distinguish CPU performance
> > from compiler performance (and from other stuff). It would
> > be nice for this to become the standard way of reporting
> > benchmarks by benchmarking sites. It would be interesting to see similar numbers in GPU benchmarks.
>
> The question is how this benchmark got ever
> published like this. Some people assume it
> was done by malice.
>
> If it was only about the benchmark, then the
> benchmark is clearly at fault because it uses
>
> typedef unsigned long farulong;
>
> but never prefixes it with volatile (I think
> this was enough in this case) in ToggleBitRun,
> so the compiler can run wild with merging
> memory accesses.
Remember this is 20 year old code - compilers were pretty dumb, volatile was new and practically unused/unknown. ByteMark never gained much popularity, it disappeared as quickly as it emerged - I never imagined that someone would use it as a mobile benchmark. By far the best solution is to never use this code, especially not as a RAM benchmark...
Wilco
> ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on July 16, 2013 3:16 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2013 8:45 pm wrote:
> > > bakaneko (nyan.delete@this.hyan.wan) on July 15, 2013 7:47 pm wrote:
> > > >
> > > > As someone who works for ARM as compiler writer,
> > > > why don't you tell us in more detail how Intel
> > > > cheated?
> > >
> > > Exophase's post to anandtech was quoted here earlier, I think. It has the relevant details:
> > >
> > > http://forums.anandtech.com/showthread.php?t=2330027
> > >
> > > and quite frankly, while optimizing multiple bit operations into a word is a very
> > > valid optimization, the code icc generates there seems a fair bit past that.
> > >
> > > Sure, it could in theory happen with a really smart compiler and lots of generic optimizations.
> > > In practice? It really smells like the compiler actively targeting a very particular code-sequence.
> > > IOW, compiler cheating. The timing that Exophase points out makes it look worse.
> > >
> > > And Wilco is right that it smells pretty bad when AnTuTu seems to be so close to
> > > intel, and seem to have bent over backwards using recent versions of icc etc.
> > >
> > > It's all "explainable". But it doesn't pass the smell test.
> > >
> > > Linus
> >
> > The optimization is clearly doable by a machine. In my opinion this means there is no reason to criticize
> > the compiler or the compiler team for adding the optimization to the compiler. The blame should go towards
> > people who published the benchmark results without making it clear that the results are a mixture of
> > raw CPU performance and compiler optimizations in the context of a particular benchmark.
> >
> > It isn't compiler cheating. It is misattribution of benchmark results. The improved benchmark numbers
> > should have been attributed to both the CPU and the compiler, rather than just to the CPU alone.
> >
> > I think it would be best for benchmarks to take into account
> > the number of executed instructions. Seeing the
> > numbers of executed instructions when comparing benchmarks
> > would make it easier to distinguish CPU performance
> > from compiler performance (and from other stuff). It would
> > be nice for this to become the standard way of reporting
> > benchmarks by benchmarking sites. It would be interesting to see similar numbers in GPU benchmarks.
>
> The question is how this benchmark got ever
> published like this. Some people assume it
> was done by malice.
>
> If it was only about the benchmark, then the
> benchmark is clearly at fault because it uses
>
> typedef unsigned long farulong;
>
> but never prefixes it with volatile (I think
> this was enough in this case) in ToggleBitRun,
> so the compiler can run wild with merging
> memory accesses.
Remember this is 20 year old code - compilers were pretty dumb, volatile was new and practically unused/unknown. ByteMark never gained much popularity, it disappeared as quickly as it emerged - I never imagined that someone would use it as a mobile benchmark. By far the best solution is to never use this code, especially not as a RAM benchmark...
Wilco