By: none (none.delete@this.none.com), July 20, 2013 2:50 am
Room: Moderated Discussions
Klimax (danklima.delete@this.gmail.com) on July 20, 2013 12:05 am wrote:
> anon (anon.delete@this.anon.com) on July 18, 2013 12:05 pm wrote:
> > Klimax (danklima.delete@this.gmail.com) on July 18, 2013 11:39 am wrote:
> > > anon (anon.delete@this.anon.com) on July 18, 2013 2:10 am wrote:
> > > > Klimax (danklima.delete@this.gmail.com) on July 18, 2013 1:41 am wrote:
> > > > > none (none.delete@this.none.com) on July 17, 2013 11:35 pm wrote:
> > > > > > Klimax (danklima.delete@this.gmail.com) on July 17, 2013 11:29 pm wrote:
> > > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 17, 2013 3:33 pm wrote:
> > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 16, 2013 5:50 am wrote:
> > > > > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 16, 2013 3:47 am wrote:
> > > > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 16, 2013 2:42 am wrote:
> > > > > > > > > > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2013 8:45 pm wrote:
> > > > > > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 15, 2013 7:47 pm wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > As someone who works for ARM as compiler writer,
> > > > > > > > > > > > > why don't you tell us in more detail how Intel
> > > > > > > > > > > > > cheated?
> > > > > > > > > > > >
> > > > > > > > > > > > Exophase's post to anandtech was quoted here earlier, I think. It has the relevant details:
> > > > > > > > > > > >
> > > > > > > > > > > > http://forums.anandtech.com/showthread.php?t=2330027
> > > > > > > > > > > >
> > > > > > > > > > > > and quite frankly, while optimizing multiple bit operations into a word is a very
> > > > > > > > > > > > valid optimization, the code icc generates there seems a fair bit past that.
> > > > > > > > > > > >
> > > > > > > > > > > > Sure, it could in theory happen with a really smart compiler and lots of generic optimizations.
> > > > > > > > > > > > In practice? It really smells like the compiler actively targeting a very particular code-sequence.
> > > > > > > > > > > > IOW, compiler cheating. The timing that Exophase points out makes it look worse.
> > > > > > > > > > > >
> > > > > > > > > > > > And Wilco is right that it smells pretty bad when AnTuTu seems to be so close to
> > > > > > > > > > > > intel, and seem to have bent over backwards using recent versions of icc etc.
> > > > > > > > > > > >
> > > > > > > > > > > > It's all "explainable". But it doesn't pass the smell test.
> > > > > > > > > > >
> > > > > > > > > > > It's also all besides the point. I would expect a better
> > > > > > > > > > > explanation from someone who claims to know their shit about
> > > > > > > > > > > compilers than a logical fallacy ("Intel improved their
> > > > > > > > > > > compiler recently", "Intel is cheating because it hits one
> > > > > > > > > > > function in a certain benchmark"). Instead he nags like his
> > > > > > > > > > > marriage has gone bad.
> > > > > > > > > >
> > > > > > > > > > If you had actually read the whole thread including the links to various articles I posted
> > > > > > > > > > then you would have found the detailed explanation that makes it obvious that Intel has been
> > > > > > > > > > cheating AnTuTu. Why should I explain all the details again in every post I make?
> > > > > > > > >
> > > > > > > > > I read the whole thread, but for some reason don't remember
> > > > > > > > > anything noteworthy.
> > > > > > > > >
> > > > > > > > > > > And no, the ICC results aren't that far fetched. Intel
> > > > > > > > > > > actually recommends -O3 -xSSSE3_ATOM* with the NDK. Which
> > > > > > > > > > > could also explain why Exophase saw optimizations which
> > > > > > > > > > > would be counterproductive for bigger programs. (If the
> > > > > > > > > > > flags have similar meaning to gcc, where inlining and loop
> > > > > > > > > > > unrolling have similar problems.)
> > > > > > > > > >
> > > > > > > > > > Given the ICC results already dropped 20% after minor changes in AnTuTu it seems this
> > > > > > > > > > optimization is no longer effective. And that proves it was very specific to the actual
> > > > > > > > > > source code rather than a generic optimization that any compiler does.
> > > > > > > > >
> > > > > > > > > How is setting/clearing a range of bits not a useful
> > > > > > > > > optimization?
> > > > > > > >
> > > > > > > > Nobody sets multiple adjacent bits in memory one at a time. Even if one did so it would
> > > > > > > > typically be a few bits, certainly not hundreds or thousands. Having seen and written
> > > > > > > > various implementations of bit-sets I know how the typical ones look like.
> > > > > > > >
> > > > > > > > > And how does this - just because this is not an optimization
> > > > > > > > > any compiler does (Or at least gcc with -Os) - suddenly make
> > > > > > > > > this an benchmark busting trick? Oh, yes because the time
> > > > > > > > > frame fits.
> > > > > > > >
> > > > > > > > Nobody writes code like this, so no compiler implements this optimization. So yes, ICC gaining
> > > > > > > > this optimization just before AnTuTu switched to ICC is proof that the optimization was added
> > > > > > > > to break AnTuTu. If ICC had implemented this particular optimization for many years then it
> > > > > > > > would be a different matter of course (although that still doesn't explain exactly why AnTuTu
> > > > > > > > switched to a non-standard compiler with options tuned for AnTuTu just for x86).
> > > > > > > So far no proof yet. Do you have any? Any evidence at all?
> > > > > >
> > > > > > Show us code that's from real code that benefits from this optimization.
> > > > >
> > > > > No one at hand, but Flash files have a lot of variable bitfields. Also same kind of code
> > > > > prompting std::vector... (So why such unrealistic bit manipulation in a benchmark?)
> > > > >
> > > > > Anyway, claimants that Intel cheats MUST prove it, no shifting
> > > > > of burden of proof. So any evidence of cheating yet?
> > > >
> > > > This isn't a criminal court case. It's speculation, and yes there is circumstantial evidence, if you
> > > > had followed the thread. From the evidence, I would say there is a pretty good chance that Intel made
> > > > this optimization for this particular benchmark. There is also some non-zero chance it was a coincidence.
> > > So far there is no such thing as evidence of such cheat. Only baseless
> > > speculation and looking for timings (without any facts for that).
> > >
> > > Like nobody posted loop similar to that used by benchmark (like
> > > reconstructed from disassembly) being left intact by ICC.
> > >
> > > Also, when there are accusations of cheating you should post proper evidence.
> >
> > Evidence was posted. Look up what circumstantial means. Several people who know more
> > than you about the matter have agreed that it is likely to be an ICC special case.
>
> Only speculation and unproven timings, that's all. And they(ref people)
> don't have anything but speculations. What they need is evidence.
> (Also I'd like to see basis for their status as experts on compiler optimizations,
> specifically their knowledge of ICC output and capabilities)
>
> What your side needs to do, is to show that such optimization is not done for similar loops.
>
> > >
> > > > Clearly you'll never get 100% indisputable proof (without seeing ICC source code).
> > > >
> > > > What we do know is that it was not a good benchmark of hardware capabilities,
> > > > and it was clearly rigged in favor of the Intel chip.
> > > >
> > > > I think we have to just leave it at that.
> > >
> > > No. It was broken benchmark. Nothing more. (So far)
> > >
> > > That's that.
> >
> > No, when I say it was rigged in favor of Intel, I mean they used non supported
> > compiler and tuned options. They claim "it was to get the best performance from
> > the chip", but conveniently failed to do the same for the ARM compilation.
> >
> > I'm talking about 2 issues: one with ICC optimizer, the other with benchmark setup.
> >
>
> ICC unproven. Broken benchmark proven beyond doubt. (If you noticed, I don't challenge second issue)
And if we had this: http://forums.anandtech.com/showthread.php?t=2330288
Look at the bottom: the offending optimization is only applied when targetting 32-bit x86. I guess that won't be enough for you and you'll keep denying icc is cheating.
> anon (anon.delete@this.anon.com) on July 18, 2013 12:05 pm wrote:
> > Klimax (danklima.delete@this.gmail.com) on July 18, 2013 11:39 am wrote:
> > > anon (anon.delete@this.anon.com) on July 18, 2013 2:10 am wrote:
> > > > Klimax (danklima.delete@this.gmail.com) on July 18, 2013 1:41 am wrote:
> > > > > none (none.delete@this.none.com) on July 17, 2013 11:35 pm wrote:
> > > > > > Klimax (danklima.delete@this.gmail.com) on July 17, 2013 11:29 pm wrote:
> > > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 17, 2013 3:33 pm wrote:
> > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 16, 2013 5:50 am wrote:
> > > > > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 16, 2013 3:47 am wrote:
> > > > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 16, 2013 2:42 am wrote:
> > > > > > > > > > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2013 8:45 pm wrote:
> > > > > > > > > > > > bakaneko (nyan.delete@this.hyan.wan) on July 15, 2013 7:47 pm wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > As someone who works for ARM as compiler writer,
> > > > > > > > > > > > > why don't you tell us in more detail how Intel
> > > > > > > > > > > > > cheated?
> > > > > > > > > > > >
> > > > > > > > > > > > Exophase's post to anandtech was quoted here earlier, I think. It has the relevant details:
> > > > > > > > > > > >
> > > > > > > > > > > > http://forums.anandtech.com/showthread.php?t=2330027
> > > > > > > > > > > >
> > > > > > > > > > > > and quite frankly, while optimizing multiple bit operations into a word is a very
> > > > > > > > > > > > valid optimization, the code icc generates there seems a fair bit past that.
> > > > > > > > > > > >
> > > > > > > > > > > > Sure, it could in theory happen with a really smart compiler and lots of generic optimizations.
> > > > > > > > > > > > In practice? It really smells like the compiler actively targeting a very particular code-sequence.
> > > > > > > > > > > > IOW, compiler cheating. The timing that Exophase points out makes it look worse.
> > > > > > > > > > > >
> > > > > > > > > > > > And Wilco is right that it smells pretty bad when AnTuTu seems to be so close to
> > > > > > > > > > > > intel, and seem to have bent over backwards using recent versions of icc etc.
> > > > > > > > > > > >
> > > > > > > > > > > > It's all "explainable". But it doesn't pass the smell test.
> > > > > > > > > > >
> > > > > > > > > > > It's also all besides the point. I would expect a better
> > > > > > > > > > > explanation from someone who claims to know their shit about
> > > > > > > > > > > compilers than a logical fallacy ("Intel improved their
> > > > > > > > > > > compiler recently", "Intel is cheating because it hits one
> > > > > > > > > > > function in a certain benchmark"). Instead he nags like his
> > > > > > > > > > > marriage has gone bad.
> > > > > > > > > >
> > > > > > > > > > If you had actually read the whole thread including the links to various articles I posted
> > > > > > > > > > then you would have found the detailed explanation that makes it obvious that Intel has been
> > > > > > > > > > cheating AnTuTu. Why should I explain all the details again in every post I make?
> > > > > > > > >
> > > > > > > > > I read the whole thread, but for some reason don't remember
> > > > > > > > > anything noteworthy.
> > > > > > > > >
> > > > > > > > > > > And no, the ICC results aren't that far fetched. Intel
> > > > > > > > > > > actually recommends -O3 -xSSSE3_ATOM* with the NDK. Which
> > > > > > > > > > > could also explain why Exophase saw optimizations which
> > > > > > > > > > > would be counterproductive for bigger programs. (If the
> > > > > > > > > > > flags have similar meaning to gcc, where inlining and loop
> > > > > > > > > > > unrolling have similar problems.)
> > > > > > > > > >
> > > > > > > > > > Given the ICC results already dropped 20% after minor changes in AnTuTu it seems this
> > > > > > > > > > optimization is no longer effective. And that proves it was very specific to the actual
> > > > > > > > > > source code rather than a generic optimization that any compiler does.
> > > > > > > > >
> > > > > > > > > How is setting/clearing a range of bits not a useful
> > > > > > > > > optimization?
> > > > > > > >
> > > > > > > > Nobody sets multiple adjacent bits in memory one at a time. Even if one did so it would
> > > > > > > > typically be a few bits, certainly not hundreds or thousands. Having seen and written
> > > > > > > > various implementations of bit-sets I know how the typical ones look like.
> > > > > > > >
> > > > > > > > > And how does this - just because this is not an optimization
> > > > > > > > > any compiler does (Or at least gcc with -Os) - suddenly make
> > > > > > > > > this an benchmark busting trick? Oh, yes because the time
> > > > > > > > > frame fits.
> > > > > > > >
> > > > > > > > Nobody writes code like this, so no compiler implements this optimization. So yes, ICC gaining
> > > > > > > > this optimization just before AnTuTu switched to ICC is proof that the optimization was added
> > > > > > > > to break AnTuTu. If ICC had implemented this particular optimization for many years then it
> > > > > > > > would be a different matter of course (although that still doesn't explain exactly why AnTuTu
> > > > > > > > switched to a non-standard compiler with options tuned for AnTuTu just for x86).
> > > > > > > So far no proof yet. Do you have any? Any evidence at all?
> > > > > >
> > > > > > Show us code that's from real code that benefits from this optimization.
> > > > >
> > > > > No one at hand, but Flash files have a lot of variable bitfields. Also same kind of code
> > > > > prompting std::vector... (So why such unrealistic bit manipulation in a benchmark?)
> > > > >
> > > > > Anyway, claimants that Intel cheats MUST prove it, no shifting
> > > > > of burden of proof. So any evidence of cheating yet?
> > > >
> > > > This isn't a criminal court case. It's speculation, and yes there is circumstantial evidence, if you
> > > > had followed the thread. From the evidence, I would say there is a pretty good chance that Intel made
> > > > this optimization for this particular benchmark. There is also some non-zero chance it was a coincidence.
> > > So far there is no such thing as evidence of such cheat. Only baseless
> > > speculation and looking for timings (without any facts for that).
> > >
> > > Like nobody posted loop similar to that used by benchmark (like
> > > reconstructed from disassembly) being left intact by ICC.
> > >
> > > Also, when there are accusations of cheating you should post proper evidence.
> >
> > Evidence was posted. Look up what circumstantial means. Several people who know more
> > than you about the matter have agreed that it is likely to be an ICC special case.
>
> Only speculation and unproven timings, that's all. And they(ref people)
> don't have anything but speculations. What they need is evidence.
> (Also I'd like to see basis for their status as experts on compiler optimizations,
> specifically their knowledge of ICC output and capabilities)
>
> What your side needs to do, is to show that such optimization is not done for similar loops.
>
> > >
> > > > Clearly you'll never get 100% indisputable proof (without seeing ICC source code).
> > > >
> > > > What we do know is that it was not a good benchmark of hardware capabilities,
> > > > and it was clearly rigged in favor of the Intel chip.
> > > >
> > > > I think we have to just leave it at that.
> > >
> > > No. It was broken benchmark. Nothing more. (So far)
> > >
> > > That's that.
> >
> > No, when I say it was rigged in favor of Intel, I mean they used non supported
> > compiler and tuned options. They claim "it was to get the best performance from
> > the chip", but conveniently failed to do the same for the ARM compilation.
> >
> > I'm talking about 2 issues: one with ICC optimizer, the other with benchmark setup.
> >
>
> ICC unproven. Broken benchmark proven beyond doubt. (If you noticed, I don't challenge second issue)
And if we had this: http://forums.anandtech.com/showthread.php?t=2330288
Look at the bottom: the offending optimization is only applied when targetting 32-bit x86. I guess that won't be enough for you and you'll keep denying icc is cheating.