By: Michael S (already5chosen.delete@this.yahoo.com), September 21, 2021 6:05 pm
Room: Moderated Discussions
-.- (blarg.delete@this.mailinator.com) on September 21, 2021 4:33 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on September 21, 2021 10:17 am wrote:
> > The buffer that your are using by 512-bit sum is twice bigger than one used by other variants.
> > IMHO, it's size is too close to the size of Skylake's L1D cache to be sure that we have 100% hit rate.
>
> Oops, missed that (I spent very little time adopting your code).
> Replaced 125 with 62:
>
> $ ./tst_64b.exe
> 0 134
> 1 268
> 2 268
> 3 268
> 4 268
> 5 268
> 6 268
> 7 268
> 8 268
> 9 268
> 10 268
> 11 268
> 12 268
> 13 268
> 14 268
> 15 268
> 16 268
> 17 268
> 18 269
> 19 268
> 20 268
> 21 268
> 22 268
> 23 268
> 24 268
> 25 268
> 26 268
> 27 268
> 28 268
> 29 268
> 30 268
> 31 268
> 32 268
> 33 268
> 34 268
> 35 268
> 36 268
> 37 268
> 38 268
> 39 268
> 40 269
> 41 268
> 42 268
> 43 268
> 44 268
> 45 269
> 46 268
> 47 268
> 48 268
> 49 268
> 50 268
> 51 268
> 52 268
> 53 268
> 54 268
> 55 268
> 56 268
> 57 268
> 58 268
> 59 268
> 60 268
> 61 268
> 62 268
> 63 268
>
Thank you.
Now a penalty is exactly 2x. A little less than I expected.
> Michael S (already5chosen.delete@this.yahoo.com) on September 21, 2021 10:17 am wrote:
> > The buffer that your are using by 512-bit sum is twice bigger than one used by other variants.
> > IMHO, it's size is too close to the size of Skylake's L1D cache to be sure that we have 100% hit rate.
>
> Oops, missed that (I spent very little time adopting your code).
> Replaced 125 with 62:
>
> $ ./tst_64b.exe
> 0 134
> 1 268
> 2 268
> 3 268
> 4 268
> 5 268
> 6 268
> 7 268
> 8 268
> 9 268
> 10 268
> 11 268
> 12 268
> 13 268
> 14 268
> 15 268
> 16 268
> 17 268
> 18 269
> 19 268
> 20 268
> 21 268
> 22 268
> 23 268
> 24 268
> 25 268
> 26 268
> 27 268
> 28 268
> 29 268
> 30 268
> 31 268
> 32 268
> 33 268
> 34 268
> 35 268
> 36 268
> 37 268
> 38 268
> 39 268
> 40 269
> 41 268
> 42 268
> 43 268
> 44 268
> 45 269
> 46 268
> 47 268
> 48 268
> 49 268
> 50 268
> 51 268
> 52 268
> 53 268
> 54 268
> 55 268
> 56 268
> 57 268
> 58 268
> 59 268
> 60 268
> 61 268
> 62 268
> 63 268
>
Thank you.
Now a penalty is exactly 2x. A little less than I expected.