"manual memcpy" and modern compilers

By: Travis (travis.downs.delete@this.gmail.com), June 1, 2017 4:30 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 1, 2017 10:39 am wrote:
> No. Really no.
> ...
> Unaligned loads remain one single load, in most cases.
> They don't turn into two loads just for being unaligned.
>
> They turn into two loads when they cross a cache fetch boundary (note the difference between cache fetch
> boundary and cacheline size - they two are not necessarily the same).

Indeed, on my Skylake box it seems that any store or load that fits within a 64-byte cache line issues at maximum speed (1 store or 2 loads a cycle). I think that's better than at least some earlier archs where I think the 32-byte boundary was relevant (i.e., the cache-fetch boundary was 32-bytes).

When the load crosses a 64-byte boundary, throughput is cut in half.

Find below the outupt from uarch-bench. Note that every load issues at 2 per cycle, except those near enough to the end of the cache line that they cross into the next (i.e., if you are doing an 8-byte load or store, you get half-throughput when you load starting at the last 7 bytes of the cache line).

Median CPU speed: 2.591 GHz
Welcome to uarch-bench
Overhead for system_clock: min=25.000, median=27.000, avg=27.530, max=92.000, n=100
Overhead for steady_clock: min=25.000, median=26.000, avg=26.130, max=27.000, n=100
Overhead for hi_res_clock: min=24.000, median=27.000, avg=27.000, max=29.000, n=100
Running 647 benchmarks
Benchmark Cycles Nanos
Dependent add chain 1.00 0.39
Independent add chain 0.25 0.10
Dependent imul 64->128 3.00 1.16
Dependent imul 64->64 3.00 1.16
Independent imul 64->128 1.01 0.39
Same location stores 1.00 0.39
Disjoint location stores 1.00 0.39
Misaligned 16-bit store [ 0] 1.00 0.39
Misaligned 16-bit store [ 1] 1.00 0.39
Misaligned 16-bit store [ 2] 1.00 0.39
Misaligned 16-bit store [ 3] 1.00 0.39
Misaligned 16-bit store [ 4] 1.00 0.39
Misaligned 16-bit store [ 5] 1.00 0.39
Misaligned 16-bit store [ 6] 1.00 0.39
Misaligned 16-bit store [ 7] 1.00 0.39
Misaligned 16-bit store [ 8] 1.00 0.39
Misaligned 16-bit store [ 9] 1.00 0.39
Misaligned 16-bit store [10] 1.00 0.39
Misaligned 16-bit store [11] 1.00 0.39
Misaligned 16-bit store [12] 1.00 0.39
Misaligned 16-bit store [13] 1.00 0.39
Misaligned 16-bit store [14] 1.00 0.39
Misaligned 16-bit store [15] 1.00 0.39
Misaligned 16-bit store [16] 1.00 0.39
Misaligned 16-bit store [17] 1.00 0.39
Misaligned 16-bit store [18] 1.00 0.39
Misaligned 16-bit store [19] 1.00 0.39
Misaligned 16-bit store [20] 1.00 0.39
Misaligned 16-bit store [21] 1.00 0.39
Misaligned 16-bit store [22] 1.00 0.39
Misaligned 16-bit store [23] 1.00 0.39
Misaligned 16-bit store [24] 1.00 0.39
Misaligned 16-bit store [25] 1.00 0.39
Misaligned 16-bit store [26] 1.00 0.39
Misaligned 16-bit store [27] 1.00 0.39
Misaligned 16-bit store [28] 1.00 0.39
Misaligned 16-bit store [29] 1.00 0.39
Misaligned 16-bit store [30] 1.00 0.39
Misaligned 16-bit store [31] 1.00 0.39
Misaligned 16-bit store [32] 1.00 0.39
Misaligned 16-bit store [33] 1.00 0.39
Misaligned 16-bit store [34] 1.00 0.39
Misaligned 16-bit store [35] 1.00 0.39
Misaligned 16-bit store [36] 1.00 0.39
Misaligned 16-bit store [37] 1.00 0.39
Misaligned 16-bit store [38] 1.00 0.39
Misaligned 16-bit store [39] 1.00 0.39
Misaligned 16-bit store [40] 1.00 0.39
Misaligned 16-bit store [41] 1.00 0.39
Misaligned 16-bit store [42] 1.00 0.39
Misaligned 16-bit store [43] 1.00 0.39
Misaligned 16-bit store [44] 1.00 0.39
Misaligned 16-bit store [45] 1.00 0.39
Misaligned 16-bit store [46] 1.00 0.39
Misaligned 16-bit store [47] 1.00 0.39
Misaligned 16-bit store [48] 1.00 0.39
Misaligned 16-bit store [49] 1.00 0.39
Misaligned 16-bit store [50] 1.00 0.39
Misaligned 16-bit store [51] 1.00 0.39
Misaligned 16-bit store [52] 1.00 0.39
Misaligned 16-bit store [53] 1.00 0.39
Misaligned 16-bit store [54] 1.00 0.39
Misaligned 16-bit store [55] 1.00 0.39
Misaligned 16-bit store [56] 1.00 0.39
Misaligned 16-bit store [57] 1.00 0.39
Misaligned 16-bit store [58] 1.00 0.39
Misaligned 16-bit store [59] 1.00 0.39
Misaligned 16-bit store [60] 1.00 0.39
Misaligned 16-bit store [61] 1.00 0.39
Misaligned 16-bit store [62] 1.00 0.39
Misaligned 16-bit store [63] 2.00 0.77
Misaligned 32-bit store [ 0] 1.00 0.39
Misaligned 32-bit store [ 1] 1.00 0.39
Misaligned 32-bit store [ 2] 1.00 0.39
Misaligned 32-bit store [ 3] 1.00 0.39
Misaligned 32-bit store [ 4] 1.00 0.39
Misaligned 32-bit store [ 5] 1.00 0.39
Misaligned 32-bit store [ 6] 1.00 0.39
Misaligned 32-bit store [ 7] 1.00 0.39
Misaligned 32-bit store [ 8] 1.00 0.39
Misaligned 32-bit store [ 9] 1.00 0.39
Misaligned 32-bit store [10] 1.00 0.39
Misaligned 32-bit store [11] 1.00 0.39
Misaligned 32-bit store [12] 1.00 0.39
Misaligned 32-bit store [13] 1.00 0.39
Misaligned 32-bit store [14] 1.00 0.39
Misaligned 32-bit store [15] 1.00 0.39
Misaligned 32-bit store [16] 1.00 0.39
Misaligned 32-bit store [17] 1.00 0.39
Misaligned 32-bit store [18] 1.00 0.39
Misaligned 32-bit store [19] 1.00 0.39
Misaligned 32-bit store [20] 1.00 0.39
Misaligned 32-bit store [21] 1.00 0.39
Misaligned 32-bit store [22] 1.00 0.39
Misaligned 32-bit store [23] 1.00 0.39
Misaligned 32-bit store [24] 1.00 0.39
Misaligned 32-bit store [25] 1.00 0.39
Misaligned 32-bit store [26] 1.00 0.39
Misaligned 32-bit store [27] 1.00 0.39
Misaligned 32-bit store [28] 1.00 0.39
Misaligned 32-bit store [29] 1.00 0.39
Misaligned 32-bit store [30] 1.00 0.39
Misaligned 32-bit store [31] 1.00 0.39
Misaligned 32-bit store [32] 1.00 0.39
Misaligned 32-bit store [33] 1.00 0.39
Misaligned 32-bit store [34] 1.00 0.39
Misaligned 32-bit store [35] 1.00 0.39
Misaligned 32-bit store [36] 1.00 0.39
Misaligned 32-bit store [37] 1.00 0.39
Misaligned 32-bit store [38] 1.00 0.39
Misaligned 32-bit store [39] 1.00 0.39
Misaligned 32-bit store [40] 1.00 0.39
Misaligned 32-bit store [41] 1.00 0.39
Misaligned 32-bit store [42] 1.00 0.39
Misaligned 32-bit store [43] 1.00 0.39
Misaligned 32-bit store [44] 1.00 0.39
Misaligned 32-bit store [45] 1.00 0.39
Misaligned 32-bit store [46] 1.00 0.39
Misaligned 32-bit store [47] 1.00 0.39
Misaligned 32-bit store [48] 1.00 0.39
Misaligned 32-bit store [49] 1.00 0.39
Misaligned 32-bit store [50] 1.00 0.39
Misaligned 32-bit store [51] 1.00 0.39
Misaligned 32-bit store [52] 1.00 0.39
Misaligned 32-bit store [53] 1.00 0.39
Misaligned 32-bit store [54] 1.00 0.39
Misaligned 32-bit store [55] 1.00 0.39
Misaligned 32-bit store [56] 1.00 0.39
Misaligned 32-bit store [57] 1.00 0.39
Misaligned 32-bit store [58] 1.00 0.39
Misaligned 32-bit store [59] 1.00 0.39
Misaligned 32-bit store [60] 1.00 0.39
Misaligned 32-bit store [61] 2.00 0.77
Misaligned 32-bit store [62] 2.00 0.77
Misaligned 32-bit store [63] 2.00 0.77
Misaligned 64-bit store [ 0] 1.00 0.39
Misaligned 64-bit store [ 1] 1.00 0.39
Misaligned 64-bit store [ 2] 1.00 0.39
Misaligned 64-bit store [ 3] 1.00 0.39
Misaligned 64-bit store [ 4] 1.00 0.39
Misaligned 64-bit store [ 5] 1.00 0.39
Misaligned 64-bit store [ 6] 1.00 0.39
Misaligned 64-bit store [ 7] 1.00 0.39
Misaligned 64-bit store [ 8] 1.00 0.39
Misaligned 64-bit store [ 9] 1.00 0.39
Misaligned 64-bit store [10] 1.00 0.39
Misaligned 64-bit store [11] 1.00 0.39
Misaligned 64-bit store [12] 1.00 0.39
Misaligned 64-bit store [13] 1.00 0.39
Misaligned 64-bit store [14] 1.00 0.39
Misaligned 64-bit store [15] 1.00 0.39
Misaligned 64-bit store [16] 1.00 0.39
Misaligned 64-bit store [17] 1.00 0.39
Misaligned 64-bit store [18] 1.00 0.39
Misaligned 64-bit store [19] 1.00 0.39
Misaligned 64-bit store [20] 1.00 0.39
Misaligned 64-bit store [21] 1.00 0.39
Misaligned 64-bit store [22] 1.00 0.39
Misaligned 64-bit store [23] 1.00 0.39
Misaligned 64-bit store [24] 1.00 0.39
Misaligned 64-bit store [25] 1.00 0.39
Misaligned 64-bit store [26] 1.00 0.39
Misaligned 64-bit store [27] 1.00 0.39
Misaligned 64-bit store [28] 1.00 0.39
Misaligned 64-bit store [29] 1.00 0.39
Misaligned 64-bit store [30] 1.00 0.39
Misaligned 64-bit store [31] 1.00 0.39
Misaligned 64-bit store [32] 1.00 0.39
Misaligned 64-bit store [33] 1.00 0.39
Misaligned 64-bit store [34] 1.00 0.39
Misaligned 64-bit store [35] 1.00 0.39
Misaligned 64-bit store [36] 1.00 0.39
Misaligned 64-bit store [37] 1.00 0.39
Misaligned 64-bit store [38] 1.00 0.39
Misaligned 64-bit store [39] 1.00 0.39
Misaligned 64-bit store [40] 1.00 0.39
Misaligned 64-bit store [41] 1.00 0.39
Misaligned 64-bit store [42] 1.00 0.39
Misaligned 64-bit store [43] 1.00 0.39
Misaligned 64-bit store [44] 1.00 0.39
Misaligned 64-bit store [45] 1.00 0.39
Misaligned 64-bit store [46] 1.00 0.39
Misaligned 64-bit store [47] 1.00 0.39
Misaligned 64-bit store [48] 1.00 0.39
Misaligned 64-bit store [49] 1.00 0.39
Misaligned 64-bit store [50] 1.00 0.39
Misaligned 64-bit store [51] 1.00 0.39
Misaligned 64-bit store [52] 1.00 0.39
Misaligned 64-bit store [53] 1.00 0.39
Misaligned 64-bit store [54] 1.00 0.39
Misaligned 64-bit store [55] 1.00 0.39
Misaligned 64-bit store [56] 1.00 0.39
Misaligned 64-bit store [57] 2.00 0.77
Misaligned 64-bit store [58] 2.00 0.77
Misaligned 64-bit store [59] 2.00 0.77
Misaligned 64-bit store [60] 2.00 0.77
Misaligned 64-bit store [61] 2.00 0.77
Misaligned 64-bit store [62] 2.00 0.77
Misaligned 64-bit store [63] 2.00 0.77
Misaligned 128-bit store [ 0] 1.00 0.39
Misaligned 128-bit store [ 1] 1.00 0.39
Misaligned 128-bit store [ 2] 1.00 0.39
Misaligned 128-bit store [ 3] 1.00 0.39
Misaligned 128-bit store [ 4] 1.00 0.39
Misaligned 128-bit store [ 5] 1.00 0.39
Misaligned 128-bit store [ 6] 1.00 0.39
Misaligned 128-bit store [ 7] 1.00 0.39
Misaligned 128-bit store [ 8] 1.00 0.39
Misaligned 128-bit store [ 9] 1.00 0.39
Misaligned 128-bit store [10] 1.00 0.39
Misaligned 128-bit store [11] 1.00 0.39
Misaligned 128-bit store [12] 1.00 0.39
Misaligned 128-bit store [13] 1.00 0.39
Misaligned 128-bit store [14] 1.00 0.39
Misaligned 128-bit store [15] 1.00 0.39
Misaligned 128-bit store [16] 1.00 0.39
Misaligned 128-bit store [17] 1.00 0.39
Misaligned 128-bit store [18] 1.00 0.39
Misaligned 128-bit store [19] 1.00 0.39
Misaligned 128-bit store [20] 1.00 0.39
Misaligned 128-bit store [21] 1.00 0.39
Misaligned 128-bit store [22] 1.00 0.39
Misaligned 128-bit store [23] 1.00 0.39
Misaligned 128-bit store [24] 1.00 0.39
Misaligned 128-bit store [25] 1.00 0.39
Misaligned 128-bit store [26] 1.00 0.39
Misaligned 128-bit store [27] 1.00 0.39
Misaligned 128-bit store [28] 1.00 0.39
Misaligned 128-bit store [29] 1.00 0.39
Misaligned 128-bit store [30] 1.00 0.39
Misaligned 128-bit store [31] 1.00 0.39
Misaligned 128-bit store [32] 1.00 0.39
Misaligned 128-bit store [33] 1.00 0.39
Misaligned 128-bit store [34] 1.00 0.39
Misaligned 128-bit store [35] 1.00 0.39
Misaligned 128-bit store [36] 1.00 0.39
Misaligned 128-bit store [37] 1.00 0.39
Misaligned 128-bit store [38] 1.00 0.39
Misaligned 128-bit store [39] 1.00 0.39
Misaligned 128-bit store [40] 1.00 0.39
Misaligned 128-bit store [41] 1.00 0.39
Misaligned 128-bit store [42] 1.00 0.39
Misaligned 128-bit store [43] 1.00 0.39
Misaligned 128-bit store [44] 1.00 0.39
Misaligned 128-bit store [45] 1.00 0.39
Misaligned 128-bit store [46] 1.00 0.39
Misaligned 128-bit store [47] 1.00 0.39
Misaligned 128-bit store [48] 1.00 0.39
Misaligned 128-bit store [49] 2.00 0.77
Misaligned 128-bit store [50] 2.00 0.77
Misaligned 128-bit store [51] 2.00 0.77
Misaligned 128-bit store [52] 2.00 0.77
Misaligned 128-bit store [53] 2.00 0.77
Misaligned 128-bit store [54] 2.00 0.77
Misaligned 128-bit store [55] 2.00 0.77
Misaligned 128-bit store [56] 2.00 0.77
Misaligned 128-bit store [57] 2.00 0.77
Misaligned 128-bit store [58] 2.00 0.77
Misaligned 128-bit store [59] 2.00 0.77
Misaligned 128-bit store [60] 2.00 0.77
Misaligned 128-bit store [61] 2.00 0.77
Misaligned 128-bit store [62] 2.00 0.77
Misaligned 128-bit store [63] 2.00 0.77
Misaligned 256-bit store [ 0] 1.00 0.39
Misaligned 256-bit store [ 1] 1.00 0.39
Misaligned 256-bit store [ 2] 1.00 0.39
Misaligned 256-bit store [ 3] 1.00 0.39
Misaligned 256-bit store [ 4] 1.00 0.39
Misaligned 256-bit store [ 5] 1.00 0.39
Misaligned 256-bit store [ 6] 1.01 0.39
Misaligned 256-bit store [ 7] 1.00 0.39
Misaligned 256-bit store [ 8] 1.00 0.39
Misaligned 256-bit store [ 9] 1.00 0.39
Misaligned 256-bit store [10] 1.00 0.39
Misaligned 256-bit store [11] 1.00 0.39
Misaligned 256-bit store [12] 1.00 0.39
Misaligned 256-bit store [13] 1.00 0.39
Misaligned 256-bit store [14] 1.00 0.39
Misaligned 256-bit store [15] 1.00 0.39
Misaligned 256-bit store [16] 1.00 0.39
Misaligned 256-bit store [17] 1.00 0.39
Misaligned 256-bit store [18] 1.00 0.39
Misaligned 256-bit store [19] 1.00 0.39
Misaligned 256-bit store [20] 1.00 0.39
Misaligned 256-bit store [21] 1.00 0.39
Misaligned 256-bit store [22] 1.00 0.39
Misaligned 256-bit store [23] 1.00 0.39
Misaligned 256-bit store [24] 1.00 0.39
Misaligned 256-bit store [25] 1.00 0.39
Misaligned 256-bit store [26] 1.00 0.39
Misaligned 256-bit store [27] 1.00 0.39
Misaligned 256-bit store [28] 1.00 0.39
Misaligned 256-bit store [29] 1.00 0.39
Misaligned 256-bit store [30] 1.00 0.39
Misaligned 256-bit store [31] 1.00 0.39
Misaligned 256-bit store [32] 1.00 0.39
Misaligned 256-bit store [33] 2.00 0.77
Misaligned 256-bit store [34] 2.00 0.77
Misaligned 256-bit store [35] 2.00 0.77
Misaligned 256-bit store [36] 2.00 0.77
Misaligned 256-bit store [37] 2.00 0.77
Misaligned 256-bit store [38] 2.00 0.77
Misaligned 256-bit store [39] 2.00 0.77
Misaligned 256-bit store [40] 2.00 0.77
Misaligned 256-bit store [41] 2.00 0.77
Misaligned 256-bit store [42] 2.00 0.77
Misaligned 256-bit store [43] 2.00 0.77
Misaligned 256-bit store [44] 2.00 0.77
Misaligned 256-bit store [45] 2.00 0.77
Misaligned 256-bit store [46] 2.00 0.77
Misaligned 256-bit store [47] 2.00 0.77
Misaligned 256-bit store [48] 2.00 0.77
Misaligned 256-bit store [49] 2.00 0.77
Misaligned 256-bit store [50] 2.00 0.77
Misaligned 256-bit store [51] 2.00 0.77
Misaligned 256-bit store [52] 2.00 0.77
Misaligned 256-bit store [53] 2.01 0.77
Misaligned 256-bit store [54] 2.00 0.77
Misaligned 256-bit store [55] 2.00 0.77
Misaligned 256-bit store [56] 2.00 0.77
Misaligned 256-bit store [57] 2.00 0.77
Misaligned 256-bit store [58] 2.00 0.77
Misaligned 256-bit store [59] 2.00 0.77
Misaligned 256-bit store [60] 2.00 0.77
Misaligned 256-bit store [61] 2.00 0.77
Misaligned 256-bit store [62] 2.00 0.77
Misaligned 256-bit store [63] 2.00 0.77
Misaligned 16-bit load [ 0] 1.00 0.39
Misaligned 16-bit load [ 1] 1.00 0.39
Misaligned 16-bit load [ 2] 1.00 0.39
Misaligned 16-bit load [ 3] 1.00 0.39
Misaligned 16-bit load [ 4] 1.00 0.39
Misaligned 16-bit load [ 5] 1.00 0.39
Misaligned 16-bit load [ 6] 1.00 0.39
Misaligned 16-bit load [ 7] 1.00 0.39
Misaligned 16-bit load [ 8] 1.00 0.39
Misaligned 16-bit load [ 9] 1.00 0.39
Misaligned 16-bit load [10] 1.00 0.39
Misaligned 16-bit load [11] 1.00 0.39
Misaligned 16-bit load [12] 1.00 0.39
Misaligned 16-bit load [13] 1.00 0.39
Misaligned 16-bit load [14] 1.00 0.39
Misaligned 16-bit load [15] 1.00 0.39
Misaligned 16-bit load [16] 1.00 0.39
Misaligned 16-bit load [17] 1.00 0.39
Misaligned 16-bit load [18] 1.00 0.39
Misaligned 16-bit load [19] 1.00 0.39
Misaligned 16-bit load [20] 1.00 0.39
Misaligned 16-bit load [21] 1.00 0.39
Misaligned 16-bit load [22] 1.00 0.39
Misaligned 16-bit load [23] 1.00 0.39
Misaligned 16-bit load [24] 1.00 0.39
Misaligned 16-bit load [25] 1.00 0.39
Misaligned 16-bit load [26] 1.01 0.39
Misaligned 16-bit load [27] 1.00 0.39
Misaligned 16-bit load [28] 1.00 0.39
Misaligned 16-bit load [29] 1.00 0.39
Misaligned 16-bit load [30] 1.00 0.39
Misaligned 16-bit load [31] 1.00 0.39
Misaligned 16-bit load [32] 1.00 0.39
Misaligned 16-bit load [33] 1.00 0.39
Misaligned 16-bit load [34] 1.00 0.39
Misaligned 16-bit load [35] 1.00 0.39
Misaligned 16-bit load [36] 1.00 0.39
Misaligned 16-bit load [37] 1.00 0.39
Misaligned 16-bit load [38] 1.00 0.39
Misaligned 16-bit load [39] 1.00 0.39
Misaligned 16-bit load [40] 1.00 0.39
Misaligned 16-bit load [41] 1.00 0.39
Misaligned 16-bit load [42] 1.00 0.39
Misaligned 16-bit load [43] 1.00 0.39
Misaligned 16-bit load [44] 1.00 0.39
Misaligned 16-bit load [45] 1.00 0.39
Misaligned 16-bit load [46] 1.00 0.39
Misaligned 16-bit load [47] 1.00 0.39
Misaligned 16-bit load [48] 1.00 0.39
Misaligned 16-bit load [49] 1.00 0.39
Misaligned 16-bit load [50] 1.00 0.39
Misaligned 16-bit load [51] 1.00 0.39
Misaligned 16-bit load [52] 1.00 0.39
Misaligned 16-bit load [53] 1.00 0.39
Misaligned 16-bit load [54] 1.00 0.39
Misaligned 16-bit load [55] 1.00 0.39
Misaligned 16-bit load [56] 1.00 0.39
Misaligned 16-bit load [57] 1.00 0.39
Misaligned 16-bit load [58] 1.00 0.39
Misaligned 16-bit load [59] 1.00 0.39
Misaligned 16-bit load [60] 1.00 0.39
Misaligned 16-bit load [61] 1.00 0.39
Misaligned 16-bit load [62] 1.00 0.39
Misaligned 16-bit load [63] 1.00 0.39
Misaligned 32-bit load [ 0] 0.50 0.19
Misaligned 32-bit load [ 1] 0.50 0.19
Misaligned 32-bit load [ 2] 0.50 0.19
Misaligned 32-bit load [ 3] 0.61 0.23
Misaligned 32-bit load [ 4] 0.50 0.19
Misaligned 32-bit load [ 5] 0.50 0.19
Misaligned 32-bit load [ 6] 0.50 0.19
Misaligned 32-bit load [ 7] 0.50 0.19
Misaligned 32-bit load [ 8] 0.50 0.19
Misaligned 32-bit load [ 9] 0.50 0.19
Misaligned 32-bit load [10] 0.50 0.19
Misaligned 32-bit load [11] 0.61 0.23
Misaligned 32-bit load [12] 0.50 0.19
Misaligned 32-bit load [13] 0.50 0.19
Misaligned 32-bit load [14] 0.50 0.19
Misaligned 32-bit load [15] 0.50 0.19
Misaligned 32-bit load [16] 0.50 0.19
Misaligned 32-bit load [17] 0.50 0.19
Misaligned 32-bit load [18] 0.50 0.19
Misaligned 32-bit load [19] 0.50 0.19
Misaligned 32-bit load [20] 0.50 0.19
Misaligned 32-bit load [21] 0.50 0.19
Misaligned 32-bit load [22] 0.50 0.19
Misaligned 32-bit load [23] 0.50 0.19
Misaligned 32-bit load [24] 0.50 0.19
Misaligned 32-bit load [25] 0.50 0.19
Misaligned 32-bit load [26] 0.50 0.19
Misaligned 32-bit load [27] 0.50 0.19
Misaligned 32-bit load [28] 0.50 0.19
Misaligned 32-bit load [29] 0.50 0.19
Misaligned 32-bit load [30] 0.50 0.19
Misaligned 32-bit load [31] 0.50 0.19
Misaligned 32-bit load [32] 0.50 0.19
Misaligned 32-bit load [33] 0.50 0.19
Misaligned 32-bit load [34] 0.50 0.19
Misaligned 32-bit load [35] 0.52 0.20
Misaligned 32-bit load [36] 0.50 0.19
Misaligned 32-bit load [37] 0.50 0.19
Misaligned 32-bit load [38] 0.50 0.19
Misaligned 32-bit load [39] 0.50 0.19
Misaligned 32-bit load [40] 0.50 0.19
Misaligned 32-bit load [41] 0.50 0.19
Misaligned 32-bit load [42] 0.50 0.19
Misaligned 32-bit load [43] 0.50 0.19
Misaligned 32-bit load [44] 0.50 0.19
Misaligned 32-bit load [45] 0.50 0.19
Misaligned 32-bit load [46] 0.50 0.19
Misaligned 32-bit load [47] 0.50 0.19
Misaligned 32-bit load [48] 0.50 0.19
Misaligned 32-bit load [49] 0.50 0.19
Misaligned 32-bit load [50] 0.61 0.23
Misaligned 32-bit load [51] 0.50 0.19
Misaligned 32-bit load [52] 0.50 0.19
Misaligned 32-bit load [53] 0.50 0.19
Misaligned 32-bit load [54] 0.50 0.19
Misaligned 32-bit load [55] 0.50 0.19
Misaligned 32-bit load [56] 0.50 0.19
Misaligned 32-bit load [57] 0.50 0.19
Misaligned 32-bit load [58] 0.50 0.19
Misaligned 32-bit load [59] 0.50 0.19
Misaligned 32-bit load [60] 0.50 0.19
Misaligned 32-bit load [61] 1.00 0.39
Misaligned 32-bit load [62] 1.00 0.39
Misaligned 32-bit load [63] 1.00 0.39
Misaligned 64-bit load [ 0] 0.50 0.19
Misaligned 64-bit load [ 1] 0.50 0.19
Misaligned 64-bit load [ 2] 0.50 0.19
Misaligned 64-bit load [ 3] 0.50 0.19
Misaligned 64-bit load [ 4] 0.50 0.19
Misaligned 64-bit load [ 5] 0.50 0.19
Misaligned 64-bit load [ 6] 0.50 0.19
Misaligned 64-bit load [ 7] 0.50 0.19
Misaligned 64-bit load [ 8] 0.50 0.19
Misaligned 64-bit load [ 9] 0.50 0.19
Misaligned 64-bit load [10] 0.50 0.19
Misaligned 64-bit load [11] 0.50 0.19
Misaligned 64-bit load [12] 0.50 0.19
Misaligned 64-bit load [13] 0.50 0.19
Misaligned 64-bit load [14] 0.50 0.19
Misaligned 64-bit load [15] 0.50 0.19
Misaligned 64-bit load [16] 0.50 0.19
Misaligned 64-bit load [17] 0.50 0.19
Misaligned 64-bit load [18] 0.50 0.19
Misaligned 64-bit load [19] 0.50 0.19
Misaligned 64-bit load [20] 0.50 0.19
Misaligned 64-bit load [21] 0.50 0.19
Misaligned 64-bit load [22] 0.50 0.19
Misaligned 64-bit load [23] 0.50 0.19
Misaligned 64-bit load [24] 0.50 0.19
Misaligned 64-bit load [25] 0.50 0.19
Misaligned 64-bit load [26] 0.50 0.19
Misaligned 64-bit load [27] 0.50 0.19
Misaligned 64-bit load [28] 0.50 0.19
Misaligned 64-bit load [29] 0.50 0.19
Misaligned 64-bit load [30] 0.50 0.19
Misaligned 64-bit load [31] 0.50 0.19
Misaligned 64-bit load [32] 0.50 0.19
Misaligned 64-bit load [33] 0.50 0.19
Misaligned 64-bit load [34] 0.50 0.19
Misaligned 64-bit load [35] 0.50 0.19
Misaligned 64-bit load [36] 0.50 0.19
Misaligned 64-bit load [37] 0.50 0.19
Misaligned 64-bit load [38] 0.50 0.19
Misaligned 64-bit load [39] 0.50 0.19
Misaligned 64-bit load [40] 0.50 0.19
Misaligned 64-bit load [41] 0.50 0.19
Misaligned 64-bit load [42] 0.50 0.19
Misaligned 64-bit load [43] 0.50 0.19
Misaligned 64-bit load [44] 0.50 0.19
Misaligned 64-bit load [45] 0.50 0.19
Misaligned 64-bit load [46] 0.50 0.19
Misaligned 64-bit load [47] 0.50 0.19
Misaligned 64-bit load [48] 0.50 0.19
Misaligned 64-bit load [49] 0.50 0.19
Misaligned 64-bit load [50] 0.50 0.19
Misaligned 64-bit load [51] 0.50 0.19
Misaligned 64-bit load [52] 0.50 0.19
Misaligned 64-bit load [53] 0.50 0.19
Misaligned 64-bit load [54] 0.50 0.19
Misaligned 64-bit load [55] 0.50 0.19
Misaligned 64-bit load [56] 0.50 0.19
Misaligned 64-bit load [57] 1.00 0.39
Misaligned 64-bit load [58] 1.00 0.39
Misaligned 64-bit load [59] 1.00 0.39
Misaligned 64-bit load [60] 1.00 0.39
Misaligned 64-bit load [61] 1.00 0.39
Misaligned 64-bit load [62] 1.00 0.39
Misaligned 64-bit load [63] 1.00 0.39
Misaligned 128-bit load [ 0] 0.50 0.19
Misaligned 128-bit load [ 1] 0.50 0.19
Misaligned 128-bit load [ 2] 0.50 0.19
Misaligned 128-bit load [ 3] 0.50 0.19
Misaligned 128-bit load [ 4] 0.50 0.19
Misaligned 128-bit load [ 5] 0.50 0.19
Misaligned 128-bit load [ 6] 0.50 0.19
Misaligned 128-bit load [ 7] 0.50 0.19
Misaligned 128-bit load [ 8] 0.50 0.19
Misaligned 128-bit load [ 9] 0.50 0.19
Misaligned 128-bit load [10] 0.50 0.19
Misaligned 128-bit load [11] 0.50 0.19
Misaligned 128-bit load [12] 0.50 0.19
Misaligned 128-bit load [13] 0.50 0.19
Misaligned 128-bit load [14] 0.50 0.19
Misaligned 128-bit load [15] 0.50 0.19
Misaligned 128-bit load [16] 0.50 0.19
Misaligned 128-bit load [17] 0.50 0.19
Misaligned 128-bit load [18] 0.50 0.19
Misaligned 128-bit load [19] 0.50 0.19
Misaligned 128-bit load [20] 0.50 0.19
Misaligned 128-bit load [21] 0.50 0.19
Misaligned 128-bit load [22] 0.50 0.19
Misaligned 128-bit load [23] 0.50 0.19
Misaligned 128-bit load [24] 0.50 0.19
Misaligned 128-bit load [25] 0.50 0.19
Misaligned 128-bit load [26] 0.50 0.19
Misaligned 128-bit load [27] 0.50 0.19
Misaligned 128-bit load [28] 0.50 0.19
Misaligned 128-bit load [29] 0.50 0.19
Misaligned 128-bit load [30] 0.50 0.19
Misaligned 128-bit load [31] 0.50 0.19
Misaligned 128-bit load [32] 0.50 0.19
Misaligned 128-bit load [33] 0.50 0.19
Misaligned 128-bit load [34] 0.50 0.19
Misaligned 128-bit load [35] 0.50 0.19
Misaligned 128-bit load [36] 0.50 0.19
Misaligned 128-bit load [37] 0.50 0.19
Misaligned 128-bit load [38] 0.50 0.19
Misaligned 128-bit load [39] 0.50 0.19
Misaligned 128-bit load [40] 0.50 0.19
Misaligned 128-bit load [41] 0.50 0.19
Misaligned 128-bit load [42] 0.50 0.19
Misaligned 128-bit load [43] 0.50 0.19
Misaligned 128-bit load [44] 0.50 0.19
Misaligned 128-bit load [45] 0.50 0.19
Misaligned 128-bit load [46] 0.50 0.19
Misaligned 128-bit load [47] 0.50 0.19
Misaligned 128-bit load [48] 0.50 0.19
Misaligned 128-bit load [49] 1.00 0.39
Misaligned 128-bit load [50] 1.00 0.39
Misaligned 128-bit load [51] 1.00 0.39
Misaligned 128-bit load [52] 1.00 0.39
Misaligned 128-bit load [53] 1.00 0.39
Misaligned 128-bit load [54] 1.00 0.39
Misaligned 128-bit load [55] 1.00 0.39
Misaligned 128-bit load [56] 1.00 0.39
Misaligned 128-bit load [57] 1.00 0.39
Misaligned 128-bit load [58] 1.00 0.39
Misaligned 128-bit load [59] 1.00 0.39
Misaligned 128-bit load [60] 1.00 0.39
Misaligned 128-bit load [61] 1.00 0.39
Misaligned 128-bit load [62] 1.00 0.39
Misaligned 128-bit load [63] 1.00 0.39
Misaligned 256-bit load [ 0] 0.50 0.19
Misaligned 256-bit load [ 1] 0.50 0.19
Misaligned 256-bit load [ 2] 0.50 0.19
Misaligned 256-bit load [ 3] 0.50 0.19
Misaligned 256-bit load [ 4] 0.50 0.19
Misaligned 256-bit load [ 5] 0.50 0.19
Misaligned 256-bit load [ 6] 0.50 0.19
Misaligned 256-bit load [ 7] 0.50 0.19
Misaligned 256-bit load [ 8] 0.50 0.19
Misaligned 256-bit load [ 9] 0.50 0.19
Misaligned 256-bit load [10] 0.50 0.19
Misaligned 256-bit load [11] 0.50 0.19
Misaligned 256-bit load [12] 0.50 0.19
Misaligned 256-bit load [13] 0.50 0.19
Misaligned 256-bit load [14] 0.50 0.19
Misaligned 256-bit load [15] 0.50 0.19
Misaligned 256-bit load [16] 0.50 0.19
Misaligned 256-bit load [17] 0.50 0.19
Misaligned 256-bit load [18] 0.50 0.19
Misaligned 256-bit load [19] 0.50 0.19
Misaligned 256-bit load [20] 0.50 0.19
Misaligned 256-bit load [21] 0.50 0.19
Misaligned 256-bit load [22] 0.50 0.19
Misaligned 256-bit load [23] 0.50 0.19
Misaligned 256-bit load [24] 0.50 0.19
Misaligned 256-bit load [25] 0.50 0.19
Misaligned 256-bit load [26] 0.50 0.19
Misaligned 256-bit load [27] 0.50 0.19
Misaligned 256-bit load [28] 0.50 0.19
Misaligned 256-bit load [29] 0.50 0.19
Misaligned 256-bit load [30] 0.50 0.19
Misaligned 256-bit load [31] 0.50 0.19
Misaligned 256-bit load [32] 0.50 0.19
Misaligned 256-bit load [33] 1.00 0.39
Misaligned 256-bit load [34] 1.00 0.39
Misaligned 256-bit load [35] 1.00 0.39
Misaligned 256-bit load [36] 1.00 0.39
Misaligned 256-bit load [37] 1.00 0.39
Misaligned 256-bit load [38] 1.00 0.39
Misaligned 256-bit load [39] 1.00 0.39
Misaligned 256-bit load [40] 1.00 0.39
Misaligned 256-bit load [41] 1.00 0.39
Misaligned 256-bit load [42] 1.00 0.39
Misaligned 256-bit load [43] 1.00 0.39
Misaligned 256-bit load [44] 1.00 0.39
Misaligned 256-bit load [45] 1.00 0.39
Misaligned 256-bit load [46] 1.00 0.39
Misaligned 256-bit load [47] 1.00 0.39
Misaligned 256-bit load [48] 1.00 0.39
Misaligned 256-bit load [49] 1.00 0.39
Misaligned 256-bit load [50] 1.00 0.39
Misaligned 256-bit load [51] 1.00 0.39
Misaligned 256-bit load [52] 1.00 0.39
Misaligned 256-bit load [53] 1.00 0.39
Misaligned 256-bit load [54] 1.00 0.39
Misaligned 256-bit load [55] 1.00 0.39
Misaligned 256-bit load [56] 1.00 0.39
Misaligned 256-bit load [57] 1.00 0.39
Misaligned 256-bit load [58] 1.00 0.39
Misaligned 256-bit load [59] 1.00 0.39
Misaligned 256-bit load [60] 1.00 0.39
Misaligned 256-bit load [61] 1.00 0.39
Misaligned 256-bit load [62] 1.00 0.39
Misaligned 256-bit load [63] 1.00 0.39


The number in square brackets is the offset from a 64-byte boundary. So [ 0] means that the load/store is 64-byte aligned.

This doesn't mean that there aren't additional penalties for misaligned ops: for example, perhaps store-forwarding takes extra time or perhaps the load-to-use cost is increased by a cycle or so (I'll test it shortly). This also only interacts with L1 - the numbers might change for the L2, L3, miss-to-DRAM scenarios.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is K12 still alive?Heikki Kultala2017/05/11 10:34 PM
  It never made senseSomeone2017/05/12 12:58 AM
    It never made sensejuanrga2017/05/12 05:02 AM
      It never made senseMichael S2017/05/12 05:47 AM
      It never made senseanon.12017/05/12 08:19 AM
        It never made sensewumpus2017/05/12 04:57 PM
          It never made senseanon.12017/05/12 06:37 PM
            It never made sensewumpus2017/05/13 07:52 AM
              It never made senseanon.12017/05/13 06:29 PM
                It never made senseDavid Kanter2017/05/14 12:41 AM
                  It never made sensejuanrga2017/05/14 05:23 AM
                    It never made sensebakaneko2017/05/14 05:56 AM
                  It never made senseanon.12017/05/14 08:36 AM
                Hierofalcon ?Michael S2017/05/14 01:15 AM
                  Hierofalcon ?anyone2017/05/15 10:05 AM
        It never made sensejuanrga2017/05/12 07:11 PM
          It never made senseanon.12017/05/13 06:59 AM
            It never made sensejuanrga2017/05/14 04:35 AM
              It never made senseanon.12017/05/14 09:26 AM
                It never made sensejuanrga2017/05/14 04:47 PM
                  It never made senseanon.12017/05/14 05:49 PM
                    It never made sensejuanrga2017/05/17 05:10 AM
                      It never made senseanon.12017/05/18 09:11 AM
                        It never made sensejuanrga2017/05/20 03:10 AM
                          It never made senseanon.12017/05/20 09:40 AM
                            It never made senseBrett2017/05/20 11:08 AM
                              It never made sensewumpus2017/05/20 12:27 PM
                                It never made senseMichael S2017/05/20 01:49 PM
                            It never made senseanon.12017/05/20 04:19 PM
                              It never made senseBrett2017/05/20 05:44 PM
                                It never made senseanon.12017/05/20 06:22 PM
                                  It never made senseBrett2017/05/20 07:08 PM
                                    It never made senseanon.12017/05/20 07:35 PM
                                    It never made senseJouni Osmala2017/05/21 08:45 AM
                                      It never made senseBrett2017/05/21 12:28 PM
                                        It never made senseJouni Osmala2017/05/22 01:07 AM
                                          It never made senseMichael S2017/05/22 01:27 AM
                                      It never made senseMaynard Handley2017/05/21 08:09 PM
                                        It never made senseAndreas2017/05/23 05:03 AM
                                          It never made senseMaynard Handley2017/05/23 09:37 AM
                                            It never made senseAndreas2017/05/24 05:11 AM
                              It never made sensedmcq2017/05/20 05:45 PM
                                It never made senseanon.12017/05/20 06:24 PM
                                  It never made senseanon.12017/05/20 07:43 PM
                                    It never made sensedmcq2017/05/21 11:34 AM
                                    It never made senseblue2017/05/21 01:29 PM
                                      It never made senseblue2017/05/21 01:30 PM
                                  It never made senseMaynard Handley2017/05/21 08:12 PM
                                  To all! Snip your citations. It's annoying as hell asit is!!! (NT)gallier22017/05/22 12:48 AM
                              Bogus ICC comparisonWilco2017/05/21 04:06 AM
                                Bogus ICC comparisonanon.12017/05/21 08:09 AM
                                  Bogus ICC comparisonMichael S2017/05/21 09:11 AM
                                  Bogus ICC comparisonDavid Kanter2017/05/21 12:42 PM
                                    Bogus ICC comparisonAnne O'Nonymous2017/05/22 04:14 AM
                                      Bogus ICC comparisonslacker2017/05/22 05:21 AM
                                        Bogus ICC comparisonAnne O'Nymous2017/05/23 11:26 AM
                                    Bogus ICC comparisondmcq2017/05/22 05:55 AM
                                      Bogus ICC comparisonanon.12017/05/22 11:59 AM
                                        Bogus ICC comparisonWilco2017/05/22 01:15 PM
                                    Bogus ICC comparisonanon.12017/05/22 11:44 AM
                                      Bogus ICC comparisonWilco2017/05/22 12:55 PM
                                Just look at the 403.gcc resultsDoug S2017/05/21 12:24 PM
                                  Just look at the 403.gcc resultsMaynard Handley2017/05/21 08:17 PM
                                    Just look at the 403.gcc resultsDoug S2017/05/21 10:14 PM
                                      Just look at the 403.gcc resultsdmcq2017/05/22 06:08 AM
                            It never made sensejuanrga2017/05/21 05:46 AM
                              It never made senseanon.12017/05/21 07:57 AM
                                It never made senseanon.12017/05/21 08:32 AM
                              It never made senseAnne O'Nonymous2017/05/22 04:11 AM
                required PRF sizeHeikki Kultala2017/05/14 08:59 PM
                  required PRF sizeWilco2017/05/15 02:18 AM
                    required PRF sizeMichael S2017/05/15 03:05 AM
                      required PRF sizeanon.12017/05/15 06:57 AM
                        required PRF sizeWilco2017/05/15 02:46 PM
                          required PRF sizeanon.12017/05/15 06:30 PM
                            required PRF sizeWilco2017/05/16 03:50 AM
                              required PRF sizeMichael S2017/05/16 04:23 AM
                              required PRF sizeanon.12017/05/16 06:57 AM
                                required PRF sizeRicardo B2017/05/16 09:10 AM
                                  required PRF sizeanon.12017/05/16 11:56 AM
                                    Thanks! (NT)Ricardo B2017/05/16 03:51 PM
                                    required PRF sizeJouni Osmala2017/05/16 10:03 PM
                                      required PRF sizeanon.12017/05/17 12:04 AM
                                  required PRF sizeMaynard Handley2017/05/16 04:56 PM
                              required PRF sizeanon.12017/05/16 08:21 AM
                    required PRF sizeLinus B Torvalds2017/05/15 10:11 AM
                      required PRF sizeMichael S2017/05/15 11:20 AM
                        required PRF sizeLinus B Torvalds2017/05/15 03:49 PM
                          required PRF sizeJouni Osmala2017/05/17 06:04 AM
                      Load-op usageWilco2017/05/15 04:29 PM
                        Load-op usageanon52017/05/15 06:05 PM
                          Load-op usageWilco2017/05/16 05:15 PM
                            Load-op usageMichael S2017/05/17 01:00 AM
                              Load-op usageWilco2017/05/17 03:02 AM
                                could it be C vs C++? (NT)Michael S2017/05/17 03:46 AM
                                Load-op usageGabriele Svelto2017/05/17 05:27 AM
                                  Load-op usageGian-Carlo Pascutto2017/05/17 08:53 AM
                                    Use perf top?Travis2017/05/17 01:21 PM
                                      Use perf top?Wilco2017/05/17 04:23 PM
                                        Use perf top?Travis2017/05/17 06:12 PM
                                          Use perf top?Seni2017/05/17 09:13 PM
                                            Use perf top?Wilco2017/05/18 03:37 AM
                                              Compiled on Skylake? (NT)Michael S2017/05/18 04:16 AM
                                              Use perf top?Gabriele Svelto2017/05/18 05:19 AM
                                                Use perf top?octoploid2017/05/18 05:48 AM
                                                  Use perf top?Gabriele Svelto2017/05/18 09:33 AM
                                                    Use perf top?octoploid2017/05/18 10:51 AM
                                                      Use perf top?Gabriele Svelto2017/05/18 01:12 PM
                                                        Use perf top?octoploid2017/05/18 01:29 PM
                                                          Use perf top?Gian-Carlo Pascutto2017/05/22 08:21 AM
                                                            Use perf top?octoploid2017/05/22 09:01 AM
                                                              Use perf top?Gian-Carlo Pascutto2017/05/22 10:21 AM
                                                                Use perf top?octoploid2017/05/22 10:34 AM
                                                                  Use perf top?Gian-Carlo Pascutto2017/05/22 10:53 AM
                                                                    Use perf top?octoploid2017/05/23 03:54 AM
                                                                      Use perf top?rwessel2017/05/23 08:58 AM
                                                                        Use perf top?octoploid2017/05/23 09:09 AM
                                                                          Use perf top?Megol2017/05/24 05:04 AM
                                                                            Use perf top?octoploid2017/05/24 05:24 AM
                                                                              Use perf top?Gian-Carlo Pascutto2017/05/24 06:53 AM
                                                                                Use perf top?octoploid2017/05/24 07:01 AM
                                                                              Use perf top?Megol2017/05/25 01:24 PM
                                          Use perf top?Wilco2017/05/18 03:20 AM
                                            Use perf top?Travis2017/05/18 02:24 PM
                                              Use perf top?Wilco2017/05/18 04:50 PM
                                                Use perf top?Travis2017/05/18 07:34 PM
                            Load-op usageMichael S2017/05/17 01:21 AM
                              Load-op usageWilco2017/05/17 03:20 AM
                                Load-op usageLinus B Torvalds2017/05/17 09:29 AM
                                  Load-op usageLinus B Torvalds2017/05/17 02:45 PM
                        Load-op usageanon.12017/05/15 06:36 PM
                          Load-op usageMichael S2017/05/16 01:27 AM
                            Load-op usageanon.12017/05/16 07:52 AM
                              Load-op usageanon.12017/05/16 07:58 AM
                              Load-op usageMichael S2017/05/17 12:52 AM
                                Load-op usageanon.12017/05/17 07:03 AM
                                  Load-op usageMichael S2017/05/17 07:24 AM
                                    Load-op usageanon.12017/05/17 11:53 PM
                                      Load-op usageMichael S2017/05/18 12:48 AM
                        Load-op usageLinus B Torvalds2017/05/16 09:01 AM
                          Load-op usageLinus B Torvalds2017/05/16 09:17 AM
                          Load-op usage_Arthur2017/05/17 05:11 PM
                            Load-op usageMichael S2017/05/18 02:50 AM
                            Load-op usageLinus B Torvalds2017/05/18 10:03 AM
                              Load-op usageoctoploid2017/05/18 11:45 AM
                                Load-op usageLinus B Torvalds2017/05/18 12:28 PM
                  required PRF sizeanon.12017/05/15 07:44 AM
                    required PRF sizeslacker2017/05/15 05:20 PM
                      required PRF sizeanon.12017/05/15 07:48 PM
                        required PRF sizeslacker2017/05/15 09:52 PM
                          Fixed linkslacker2017/05/15 09:54 PM
                          required PRF sizeanon.12017/05/16 07:56 AM
          It never made senseanon.12017/05/13 08:03 AM
            It never made senseanon.12017/05/13 08:31 AM
              It never made sensenobody in particular2017/05/13 09:02 AM
              It never made senseGabriele Svelto2017/05/13 09:05 AM
                It never made senseanon.12017/05/13 11:07 AM
                It never made senseAaron Spink2017/05/13 05:18 PM
              It never made senseDavid Hess2017/05/13 07:28 PM
                It never made senseBrett2017/05/13 10:25 PM
                It never made senseanon.12017/05/13 11:44 PM
                  It never made senseNiels Jørgen Kruse2017/05/14 02:37 AM
                    It never made senseanon.12017/05/14 09:45 AM
                      It never made senseNiels Jørgen Kruse2017/05/14 01:06 PM
                    It never made senseMaynard Handley2017/05/16 04:46 AM
                      It never made senseNiels Jørgen Kruse2017/05/16 10:24 PM
                  It never made sensejuanrga2017/05/14 05:02 AM
                    It never made sensenobody in particular2017/05/14 05:31 AM
                      It never made sensejuanrga2017/05/14 02:36 PM
                        It never made sensenobody in particular2017/05/14 03:50 PM
                          It never made sensejuanrga2017/05/14 05:36 PM
                            You're discussing two dead-in-the-water architecturesdefault2017/05/15 02:52 PM
                              You're discussing two dead-in-the-water architecturesblue2017/05/15 07:14 PM
                              You're discussing two dead-in-the-water architecturesjuanrga2017/05/17 04:52 AM
                    It never made senseanon.12017/05/14 08:27 AM
                      It never made senseMichael S2017/05/14 08:54 AM
                        It never made senseanon.12017/05/14 09:40 AM
                      It never made sensejuanrga2017/05/14 03:09 PM
                        It never made sensenobody in particular2017/05/14 03:51 PM
                        It never made senseMichael S2017/05/14 03:56 PM
                        It never made senseanon.12017/05/14 05:54 PM
                  It never made senseDavid Hess2017/05/14 11:02 AM
                    It never made senseBrett2017/05/14 01:24 PM
                      It never made senseMichael S2017/05/15 04:55 AM
                        It never made senseAnon2017/05/15 04:14 PM
                          It never made senseMichael S2017/05/16 02:21 AM
                            It never made sensehobel2017/05/16 08:42 AM
                      It never made senseDavid Hess2017/05/15 06:33 AM
                    It never made sensewumpus2017/05/14 03:08 PM
                      It never made senseDavid Hess2017/05/15 06:23 AM
            It never made sensejuanrga2017/05/14 04:49 AM
              It never made senseAaron Spink2017/05/14 04:58 AM
    It never made senseHeikki Kultala2017/05/12 11:47 AM
      It never made senseAaron Spink2017/05/13 05:20 PM
    It never made senseWes Felter2017/05/12 01:18 PM
      It never made senseanon.12017/05/12 06:32 PM
  Is K12 still alive?juanrga2017/05/12 04:49 AM
    Is K12 still alive?Heikki Kultala2017/05/12 11:31 AM
      Is K12 still alive?who me?2017/05/17 07:39 PM
        Is K12 still alive?juanrga2017/05/18 02:44 AM
        Is K12 still alive?dmcq2017/05/22 06:19 AM
          Is K12 still alive?Foo_2017/05/22 07:56 AM
            Is K12 still alive?David Kanter2017/05/22 02:42 PM
              Is K12 still alive?Linus B Torvalds2017/05/22 07:45 PM
                Is K12 still alive?Michael_S2017/05/22 11:34 PM
                Is K12 still alive?David Kanter2017/05/23 09:17 AM
                  Is K12 still alive?Linus B Torvalds2017/05/23 10:29 AM
                    Is K12 still alive?octoploid2017/05/23 11:25 AM
                      slow AVX-512 memcpy/memsetEric Bron2017/05/23 12:48 PM
                        slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 01:51 PM
                          slow AVX-512 memcpy/memsetEric Bron2017/05/23 02:05 PM
                            slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 02:43 PM
                              slow AVX-512 memcpy/memsetEric Bron2017/05/23 02:59 PM
                                KNL code generator vs 2014Michael S2017/05/24 12:57 AM
                                  KNL code generator vs 2014Eric Bron2017/05/24 04:21 AM
                                  KNL code generator vs 2014anon.5122017/05/24 04:03 PM
                                    KNL code generator vs 2014Michael S2017/05/25 08:32 AM
                                  food for thoughtEric Bron2017/05/24 04:57 PM
                                    icc 17 on godbolt disagreeMichael S2017/05/25 01:45 AM
                                      Sorry, I posted SKX code twiceMichael S2017/05/25 01:48 AM
                                         stall 2 - are KNL VPUs really OoO?Michael S2017/05/25 02:27 AM
                                      which version of icc 17 ? (NT)Eric Bron2017/05/25 03:50 AM
                                        17.0.0Michael S2017/05/25 03:52 AM
                                          17.0.0Eric Bron2017/05/25 04:13 AM
                                          17.0.0Eric Bron2017/05/25 04:24 AM
                                            17.0.0Michael S2017/05/25 05:29 AM
                                              17.0.0Eric Bron2017/05/25 05:43 AM
                                                17.0.0Michael S2017/05/25 08:40 AM
                                                  strange 256-bit code with icc v7.0.4Eric Bron2017/05/25 10:51 AM
                                              17.0.0Eric Bron2017/05/25 05:54 AM
                                          fixed exampleEric Bron2017/05/25 04:57 AM
                              slow AVX-512 memcpy/memsetTravis2017/05/23 03:57 PM
                                correction: has NOT been the caseTravis2017/05/23 03:58 PM
                              slow AVX-512 memcpy/memsetanon2017/05/24 06:00 AM
                                slow AVX-512 memcpy/memsetTravis2017/05/24 02:27 PM
                                  slow AVX-512 memcpy/memsetanon2017/05/25 02:16 AM
                                    slow AVX-512 memcpy/memsetTravis2017/05/25 05:02 PM
                            slow AVX-512 memcpy/memsetGabriele Svelto2017/05/24 05:12 AM
                          slow AVX-512 memcpy/memsetDoug S2017/05/23 02:35 PM
                            slow AVX-512 memcpy/memsetLinus B Torvalds2017/05/23 03:07 PM
                              Dedicated mem* instructionsDoug S2017/05/23 11:17 PM
                                Dedicated mem* instructionsLinus Torvalds2017/05/24 01:21 AM
                                  Dedicated mem* instructionsLinus Torvalds2017/05/24 08:16 AM
                                    Dedicated mem* instructionsanon2017/05/24 09:52 AM
                                      Dedicated mem* instructionsLinus Torvalds2017/05/24 11:31 AM
                                        Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions? (NT)TEMLIB2017/05/24 12:52 PM
                                          asynchronous co-processors are evil (NT)Michael S2017/05/24 12:57 PM
                                          Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?David Hess2017/05/24 03:52 PM
                                          Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?Travis2017/05/24 03:55 PM
                                            Should mem copy/fill/move be an instruction or a coprocessor with asychronous instructions?TEMLIB2017/05/24 04:29 PM
                                        Dedicated mem* instructionsanon2017/05/24 08:39 PM
                                        AVX-512 and XOPYuhong Bao2017/05/24 11:19 PM
                                          128-bit vs 256-bit vectors in cryptoYuhong Bao2017/05/31 11:37 AM
                                    Dedicated mem* instructionsDoug S2017/05/24 12:37 PM
                                      Dedicated mem* instructionsMichael S2017/05/24 12:55 PM
                                        Dedicated mem* instructionsDoug S2017/05/24 02:35 PM
                                          Dedicated mem* instructionsLinus Torvalds2017/05/24 03:41 PM
                                            Dedicated mem* instructionsTravis2017/05/24 04:20 PM
                                              Dedicated mem* instructionsLinus Torvalds2017/05/25 10:54 AM
                                  Dedicated mem* instructionsGabriele Svelto2017/05/25 04:05 PM
                                Immediate lengths for mem* instructionsPaul A. Clayton2017/05/26 04:55 AM
                              slow AVX-512 memcpy/memsetTravis2017/05/24 03:41 PM
                                ucode branch predictionDavid Kanter2017/05/24 05:45 PM
                          Then why use even AVX2 for memcpy?Mark Roulo2017/05/23 04:30 PM
                            Then why use even AVX2 for memcpy?Linus B Torvalds2017/05/23 10:08 PM
                              Danke (NT).Mark Roulo2017/05/24 11:52 AM
                            It's all about the length of the memcpy.Heikki Kultala2017/05/23 10:18 PM
                              It's all about the length of the memcpy.Heikki Kultala2017/05/23 10:26 PM
                              It's all about the length of the memcpy.Yoav2017/05/24 01:08 AM
                              It's all about the length of the memcpy.Michael S2017/05/24 01:37 AM
                              It's all about the length of the memcpy.Megol2017/05/24 03:39 AM
                              It's all about the length of the memcpy.Gabriele Svelto2017/05/24 05:17 AM
                                It's all about the length of the memcpy.Travis2017/05/24 02:46 PM
                                  It's all about the length of the memcpy.Gabriele Svelto2017/05/25 04:24 AM
                                    It's all about the length of the memcpy.octoploid2017/05/25 04:45 AM
                                      Forgot , but you get the idea (NT)octoploid2017/05/25 05:12 AM
                                        Forgot to add a pre tag but you get the idea (NT)octoploid2017/05/25 05:14 AM
                                      It's all about the length of the memcpy.Gabriele Svelto2017/05/25 03:37 PM
                                        It's all about the length of the memcpy.Wilco2017/05/25 03:48 PM
                                          It's all about the length of the memcpy.Gabriele Svelto2017/05/25 04:07 PM
                                            It's all about the length of the memcpy.Wilco2017/05/26 02:47 AM
                                              "manual memcpy" and modern compilersHeikki Kultala2017/05/27 11:27 PM
                                                "manual memcpy" and modern compilersLinus Torvalds2017/05/29 08:30 PM
                                                  "manual memcpy" and modern compilersTravis2017/05/29 09:32 PM
                                                    "manual memcpy" and modern compilersLinus Torvalds2017/05/30 10:54 AM
                                                      "manual memcpy" and modern compilersJason Creighton2017/05/30 12:33 PM
                                                        "manual memcpy" and modern compilersWilco2017/05/30 08:29 PM
                                                      "manual memcpy" and modern compilersTravis2017/05/30 08:23 PM
                                                        "manual memcpy" and modern compilersWilco2017/05/30 08:34 PM
                                                          "manual memcpy" and modern compilersoctoploid2017/05/30 09:46 PM
                                                            "manual memcpy" and modern compilersWilco2017/05/31 02:28 AM
                                                              "manual memcpy" and modern compilersoctoploid2017/05/31 03:14 AM
                                                                "manual memcpy" and modern compilersWilco2017/05/31 02:42 PM
                                                                "manual memcpy" and modern compilersTravis2017/05/31 06:40 PM
                                                                  "manual memcpy" and modern compilersJouni Osmala2017/05/31 11:42 PM
                                                                    "manual memcpy" and modern compilersLinus Torvalds2017/06/01 10:39 AM
                                                                      "manual memcpy" and modern compilersTravis2017/06/01 04:30 PM
                                                                        "manual memcpy" and modern compilersoctoploid2017/06/02 01:26 AM
                                                                          "manual memcpy" and modern compilersoctoploid2017/06/02 01:27 AM
                                                                            "manual memcpy" and modern compilersTravis2017/06/02 12:18 PM
                                                                              "manual memcpy" and modern compilersTravis2017/06/02 12:40 PM
                                                                          "manual memcpy" and modern compilersoctoploid2017/06/02 03:29 AM
                                                                            "manual memcpy" and modern compilersGiGNiC2017/06/02 05:23 AM
                                                                            "manual memcpy" and modern compilersTravis2017/06/02 07:56 PM
                                                                          "manual memcpy" and modern compilersTravis2017/06/02 02:05 PM
                                                                            "manual memcpy" and modern compilersLinus Torvalds2017/06/02 03:48 PM
                                                                              "manual memcpy" and modern compilersTravis2017/06/02 04:50 PM
                                                                                "manual memcpy" and modern compilersgiovanni deretta2017/06/03 01:43 PM
                                                                                  "manual memcpy" and modern compilersDavid Kanter2017/06/04 10:04 AM
                                                                                  "manual memcpy" and modern compilersTravis2017/06/04 01:53 PM
                                                                                    "manual memcpy" and modern compilersDavid Kanter2017/06/04 09:03 PM
                                                                                      memory renamingTravis2017/06/06 11:52 AM
                                                                                        memory renaminganon.12017/06/07 08:06 PM
                                                                                          memory renaminganon.12017/06/07 08:54 PM
                                                                          "manual memcpy" and modern compilersTravis2017/06/02 08:21 PM
                                                                            "manual memcpy" and modern compilersoctoploid2017/06/02 09:31 PM
                                                                              "manual memcpy" and modern compilersoctoploid2017/06/03 02:19 AM
                                                                                "manual memcpy" and modern compilersTravis2017/06/03 11:38 AM
                                                                                  "manual memcpy" and modern compilersLinus Torvalds2017/06/04 10:57 AM
                                                                                    "manual memcpy" and modern compilersTravis2017/06/04 02:11 PM
                                                                                      "manual memcpy" and modern compilersMichael S2017/06/05 04:47 AM
                                                                        "manual memcpy" and modern compilersLinus Torvalds2017/06/02 09:21 AM
                                                                      "manual memcpy" and modern compilersYuhong Bao2017/06/02 06:02 PM
                                                                        "manual memcpy" and modern compilersLinus Torvalds2017/06/02 10:27 PM
                                                                          "manual memcpy" and modern compilersYuhong Bao2017/06/03 10:26 PM
                                                                            "manual memcpy" and modern compilersLinus Torvalds2017/06/04 11:12 AM
                                                                              "manual memcpy" and modern compilersgiovanni deretta2017/06/05 01:22 AM
                                                                                "manual memcpy" and modern compilersLinus Torvalds2017/06/05 09:49 AM
                                                          "manual memcpy" and modern compilersBrett2017/05/30 10:07 PM
                                                            "manual memcpy" and modern compilersWilco2017/05/31 02:37 AM
                                                              "manual memcpy" and modern compilersBrett2017/05/31 10:28 PM
                                                          "manual memcpy" and modern compilersTravis2017/05/31 06:29 PM
                                                      "manual memcpy" and modern compilersTravis2017/05/31 06:30 PM
                                                        "manual memcpy" and modern compilersWilco2017/06/01 02:06 AM
                                                          "manual memcpy" and modern compilersTravis2017/06/01 12:32 PM
                                                            "manual memcpy" and modern compilersWilco2017/06/01 01:51 PM
                                    It's all about the length of the memcpy.Travis2017/05/25 05:19 PM
                                      It's all about the length of the memcpy.Michael S2017/05/26 03:07 AM
                                        It's all about the length of the memcpy.Linus Torvalds2017/05/26 02:01 PM
                                      It's all about the length of the memcpy.Linus Torvalds2017/05/26 12:34 PM
                                        It's all about the length of the memcpy.Travis2017/05/26 05:13 PM
                                          It's all about the length of the memcpy.Travis2017/05/26 05:16 PM
                                          It's all about the length of the memcpy.Brett2017/05/26 08:25 PM
                                            It's all about the length of the memcpy.Travis2017/05/27 02:56 PM
                                          It's all about the length of the memcpy.Linus Torvalds2017/05/27 08:50 AM
                                            big.LITTLE ???Michael S2017/05/27 11:09 AM
                                              big.LITTLE ???Linus Torvalds2017/05/27 11:56 AM
                                                may be, Mongoose core ?Michael S2017/05/27 12:43 PM
                                                big.LITTLE ???Travis2017/05/27 03:18 PM
                                                  big.LITTLE ???Linus Torvalds2017/05/28 05:18 PM
                                                    big.LITTLE ???Travis2017/05/28 09:31 PM
                                                    In *theory* this is fixable with better benchmarks ...Mark Roulo2017/05/30 10:22 AM
                                                      In *theory* this is fixable with better benchmarks ...Linus Torvalds2017/05/30 11:12 AM
                                            It's all about the length of the memcpy.Travis2017/05/27 02:49 PM
                                              NT stores are an issueHeikki Kultala2017/05/27 11:25 PM
                                                NT stores are an issueTravis2017/05/28 12:38 AM
                                                  NT stores are an issue (Ryzen result)octoploid2017/05/28 12:57 AM
                                                    NT stores are an issue (Ryzen result)octoploid2017/05/28 12:59 AM
                                                      Bogus extra newline when using code,preoctoploid2017/05/28 01:03 AM
                                                        Bogus extra newline when using code,preMichael S2017/05/28 01:35 AM
                                                    NT stores are an issue (Ryzen result)Travis2017/05/28 01:30 AM
                                                      NT stores are an issue (Ryzen result)Travis2017/05/28 01:35 AM
                                                      NT stores are an issue (Ryzen result)Michael S2017/05/28 01:45 AM
                                                        NT stores are an issue (Ryzen result)Travis2017/05/28 02:20 AM
                                                    NT stores are an issue (Ryzen result)Travis2017/05/28 02:22 AM
                                                      NT stores are an issue (Ryzen result)octoploid2017/05/28 02:30 AM
                                                        NT stores are an issue (Ryzen result)Travis2017/05/28 01:10 PM
                                              It's all about the length of the memcpy.Doug S2017/05/28 08:55 AM
                                      It's all about the length of the memcpy.Gabriele Svelto2017/05/26 03:33 PM
                                        It's all about the length of the memcpy.Travis2017/05/26 06:51 PM
                                          It's all about the length of the memcpy.Seni2017/05/28 03:14 PM
                                            It's all about the length of the memcpy.Travis2017/05/28 03:26 PM
                                              It's all about the length of the memcpy.Gabriele Svelto2017/05/29 05:53 AM
                                                It's all about the length of the memcpy.Travis2017/05/29 02:04 PM
                                                  It's all about the length of the memcpy.Seni2017/05/29 05:06 PM
                                                    It's all about the length of the memcpy.Travis2017/05/29 07:45 PM
                                                      It's all about the length of the memcpy.Brett2017/05/29 09:36 PM
                                                  Real code, real data from a real workloadGabriele Svelto2017/05/30 03:59 PM
                                                    Real code, real data from a real workloadTravis2017/05/30 08:01 PM
                                                      Real code, real data from a real workloadGabriele Svelto2017/05/31 09:31 AM
                                                        Real code, real data from a real workloadgallier22017/05/31 10:02 AM
                                                        Real code, real data from a real workloadSymmetry2017/05/31 10:17 AM
                                                          Real code, real data from a real workloadTravis2017/05/31 06:49 PM
                                                        Real code, real data from a real workloadTravis2017/05/31 06:27 PM
                                                          Real code, real data from a real workloadMichael S2017/06/01 02:38 AM
                                                            Real code, real data from a real workloadWilco2017/06/01 11:06 AM
                                                              fixed indeedMichael S2017/06/01 12:23 PM
                                                          Real code, real data from a real workloadGabriele Svelto2017/06/01 09:44 PM
                                                            Real code, real data from a real workloadTravis2017/06/02 02:38 PM
                                                              Real code, real data from a real workloadmeh2017/06/03 06:22 AM
                                                                Real code, real data from a real workloadTravis2017/06/03 11:50 AM
                                                            Real code, real data from a real workloadSeni2017/06/02 04:34 PM
                                                              Real code, real data from a real workloadBrendan2017/06/02 11:09 PM
                                                                Real code, real data from a real workloadSeni2017/06/03 03:49 AM
                                                                Real code, real data from a real workloadrwessel2017/06/03 11:40 AM
                                                                  Real code, real data from a real workloadTravis2017/06/03 01:40 PM
                                                                Real code, real data from a real workloadTravis2017/06/03 01:20 PM
                                                          Real code, real data from a real workloadRicardo B2017/06/04 02:47 PM
                                                            Real code, real data from a real workloadTravis2017/06/04 05:15 PM
                                                              correctionTravis2017/06/04 05:17 PM
                                                              Real code, real data from a real workloadRicardo B2017/06/04 07:03 PM
                                                                Real code, real data from a real workloadTravis2017/06/06 12:33 PM
                                                            Real code, real data from a real workloadEtienne2017/06/05 03:40 AM
                                It's all about the length of the memcpy.Megol2017/05/25 08:08 AM
                              rep movsb is still slowWilco2017/05/25 03:43 PM
                                4K is not small... (NT)iz2017/05/26 01:10 PM
                                  Random copies are < 256 bytes (NT)Wilco2017/05/26 02:38 PM
                                rep movsb is still slowBrendan2017/05/27 07:50 PM
                                  rep movsb is still slowTravis2017/05/27 09:27 PM
                            Then why use even AVX2 for memcpy?Eric Bron2017/05/24 12:22 AM
                    Is K12 still alive?Ronald Maas2017/05/23 09:27 PM
                      Is K12 still alive?dmcq2017/05/24 03:37 AM
                    Wide registersLaurent2017/05/24 08:53 AM
                      It's called Amdahl's law (NT)Gabriele Svelto2017/05/25 04:09 PM
                      Wide registersMichael S2017/05/26 03:24 AM
                        Wide registersEric Bron2017/05/26 05:47 AM
                          Ivan Godard (NT)Michael S2017/05/27 11:11 AM
                        Wide registersLaurent2017/05/26 08:44 AM
            Is K12 still alive?dmcq2017/05/23 04:47 AM
              Is K12 still alive?juanrga2017/05/23 05:29 AM
              the whole post makes no sense at all (NT)Michael S2017/05/23 06:03 AM
                did you expect different?blue2017/05/23 08:07 AM
                  did you expect different?dmcq2017/05/24 03:35 AM
                    did you expect juanrga post to make sense? (NT) (clarified?)blue2017/05/27 03:44 AM
                      did you follow the discussion?Michael S2017/05/28 01:30 AM
                        did you follow the discussion?dmcq2017/05/28 03:05 AM
                          did you follow the discussion?juanrga2017/05/28 12:24 PM
                          did you follow the discussion?anon.12017/05/28 01:57 PM
                            did you follow the discussion?dmcq2017/05/28 03:18 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?