By: MS (ms.delete@this.lostcircuits.com), January 18, 2011 11:59 am
Room: Moderated Discussions
Eric Bron (eric.bron@zvisuelREMOVE.com) on 1/18/11 wrote:
---------------------------
>>Could that post from TimP have a simple "typo"? I mean, we are talking about a
>>single reference here. It certainly is possible, though.
>
>he clearly states that they have reached a good speedup after optimizing for the
>L1 DCache size, I suppose he is right on this, though maybe his "16B" is a typo, I'll ask on the AVX forum
>
looking further down it seems to make sense in that the instructions can be transferred in two cycles but then both parts are executed in a single cycle. Since they are SIMD, a single cycle initial delay may not make much of a difference.
>anyway the decoded icache looks fine for AVX and you can update your article on this point IMHO
Done, thanks again!
---------------------------
>>Could that post from TimP have a simple "typo"? I mean, we are talking about a
>>single reference here. It certainly is possible, though.
>
>he clearly states that they have reached a good speedup after optimizing for the
>L1 DCache size, I suppose he is right on this, though maybe his "16B" is a typo, I'll ask on the AVX forum
>
looking further down it seems to make sense in that the instructions can be transferred in two cycles but then both parts are executed in a single cycle. Since they are SIMD, a single cycle initial delay may not make much of a difference.
>anyway the decoded icache looks fine for AVX and you can update your article on this point IMHO
Done, thanks again!



