By: Michael S (already5chosen.delete@this.yahoo.com), November 1, 2008 1:53 pm
Room: Moderated Discussions
EduardoS (no@spam.com) on 11/1/08 wrote:
---------------------------
>Michael S (already5chosen@yahoo.com) on 11/1/08 wrote:
>---------------------------
>>I think there is some mistake in their measurement methodology. It is damn hard
>>to believe that 64-bit moves have higher throughput than 32-bit moves.
>
>The problem with 64-bit moves are alignment and bank conflict, K-8 banks are 8
>bytes wide, two sequential 32 bits stores results in a bank conflict.
>
>>Right now I have no access to K8 running 64-bit OS.
>>Give me couple of days, I'll test that when back at work.
>
>Even a 32 bits SO will be fine, just make sure each store is separated by 8 bytes
>and they are aligned, I did it here, it works.
>
I was wrong, sorry.
Achieving 2 stores per clock on 32-bit OS was surprisingly easy. It worked even when adjacent stores were separated by 4 bytes.
---------------------------
>Michael S (already5chosen@yahoo.com) on 11/1/08 wrote:
>---------------------------
>>I think there is some mistake in their measurement methodology. It is damn hard
>>to believe that 64-bit moves have higher throughput than 32-bit moves.
>
>The problem with 64-bit moves are alignment and bank conflict, K-8 banks are 8
>bytes wide, two sequential 32 bits stores results in a bank conflict.
>
>>Right now I have no access to K8 running 64-bit OS.
>>Give me couple of days, I'll test that when back at work.
>
>Even a 32 bits SO will be fine, just make sure each store is separated by 8 bytes
>and they are aligned, I did it here, it works.
>
I was wrong, sorry.
Achieving 2 stores per clock on 32-bit OS was surprisingly easy. It worked even when adjacent stores were separated by 4 bytes.