By: Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com), January 4, 2015 5:58 am
Room: Moderated Discussions
> Sharing will seriously compromise L2 latency for the common case,
this was my impression too
btw, after revisiting the KNC documentation it looks like the L2 tiles aren't shared after all,
see [1] p. 22 :
"The L2 cache organization per core is inclusive of the L1 data and instruction caches. How all cores work together to make a large, shared, L2 global cache (up to 31 MB) may not be clear at first glance. Since each core contributes 512 KB of L2 to the total shared cache storage, it may appear as though a maximum of 31 MB of common L2 cache is available. However, if two or more cores are sharing data, the shared data is replicated among the individual cores’ various L2 caches. That is, if no cores share any data or code, then the effective total L2 size of the chip is 31 MB. Whereas, if every core shares exactly the same code and data in perfect synchronization, then the effective total L2 size of the chip is only 512 KB. The actual size of the workload-perceived L2 storage is a function of the degree of code and data sharing among cores and thread."
this looks like a convoluted way to explain that the L2 tiles are simply independent i.e. that the (strangely named) "L2 global cache" isn't a LLC
[1] Intel® Xeon Phi™ Coprocessor System Software Developers Guide, rev 2.03, November 2012
this was my impression too
btw, after revisiting the KNC documentation it looks like the L2 tiles aren't shared after all,
see [1] p. 22 :
"The L2 cache organization per core is inclusive of the L1 data and instruction caches. How all cores work together to make a large, shared, L2 global cache (up to 31 MB) may not be clear at first glance. Since each core contributes 512 KB of L2 to the total shared cache storage, it may appear as though a maximum of 31 MB of common L2 cache is available. However, if two or more cores are sharing data, the shared data is replicated among the individual cores’ various L2 caches. That is, if no cores share any data or code, then the effective total L2 size of the chip is 31 MB. Whereas, if every core shares exactly the same code and data in perfect synchronization, then the effective total L2 size of the chip is only 512 KB. The actual size of the workload-perceived L2 storage is a function of the degree of code and data sharing among cores and thread."
this looks like a convoluted way to explain that the L2 tiles are simply independent i.e. that the (strangely named) "L2 global cache" isn't a LLC
[1] Intel® Xeon Phi™ Coprocessor System Software Developers Guide, rev 2.03, November 2012