By: -.- (blarg.delete@this.mailinator.com), September 21, 2021 4:33 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on September 21, 2021 10:17 am wrote:
> The buffer that your are using by 512-bit sum is twice bigger than one used by other variants.
> IMHO, it's size is too close to the size of Skylake's L1D cache to be sure that we have 100% hit rate.
Oops, missed that (I spent very little time adopting your code).
Replaced 125 with 62:
$ ./tst_64b.exe
0 134
1 268
2 268
3 268
4 268
5 268
6 268
7 268
8 268
9 268
10 268
11 268
12 268
13 268
14 268
15 268
16 268
17 268
18 269
19 268
20 268
21 268
22 268
23 268
24 268
25 268
26 268
27 268
28 268
29 268
30 268
31 268
32 268
33 268
34 268
35 268
36 268
37 268
38 268
39 268
40 269
41 268
42 268
43 268
44 268
45 269
46 268
47 268
48 268
49 268
50 268
51 268
52 268
53 268
54 268
55 268
56 268
57 268
58 268
59 268
60 268
61 268
62 268
63 268
> The buffer that your are using by 512-bit sum is twice bigger than one used by other variants.
> IMHO, it's size is too close to the size of Skylake's L1D cache to be sure that we have 100% hit rate.
Oops, missed that (I spent very little time adopting your code).
Replaced 125 with 62:
$ ./tst_64b.exe
0 134
1 268
2 268
3 268
4 268
5 268
6 268
7 268
8 268
9 268
10 268
11 268
12 268
13 268
14 268
15 268
16 268
17 268
18 269
19 268
20 268
21 268
22 268
23 268
24 268
25 268
26 268
27 268
28 268
29 268
30 268
31 268
32 268
33 268
34 268
35 268
36 268
37 268
38 268
39 268
40 269
41 268
42 268
43 268
44 268
45 269
46 268
47 268
48 268
49 268
50 268
51 268
52 268
53 268
54 268
55 268
56 268
57 268
58 268
59 268
60 268
61 268
62 268
63 268