Beyond hot and cold code?

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), December 17, 2019 2:03 pm
Room: Moderated Discussions
While distinguishing code based on average execution frequency is useful and more readily available from existing profiling tools, I suspect adding density information can facilitate further optimization.

Code which is executed with moderate frequency may be executed with different degrees of "clumping" (and the average density at different granularities may vary). Some moderate frequency code may also have temporal locality (or the inverse) with other code.

Dense code would presumably be a better target for size-expanding optimizations. Obviously, selective function inlining is more attractive if few call sites dominate execution counts, but using more cache for code that will be reused while in cache is more likely to be an optimization (the bandwidth cost is paid less often than the execution cost). The capacity pressure of the cache could also be considered; this pressure is not necessary constant and the cache capacity cost of a code expansion can vary based on the price of cache capacity at the time.

(While in-memory code size can be important, memory [and inter-cache] bandwidth and cache capacities seem to be generally more important.)

There are also considerations of optimization effort both for the compiler and the programmer.

(Even the performance of cold code may be important when worst case — or far tail — performance is important.)

Temporal locality of data reuse seems more important as data size tend to be greater than code size even within a single program execution. A compiler could probably provide gross array-of-structures versus structure-of-arrays type of optimizations and hot-cold splitting, but clumping data based on correlation of access timing seems out-of-reach to compilers for now, particularly when memory allocation and structure layout is less abstracted to allow compiler optimizations. There are also criticality aspects to data; resolving less predictable branches and addressing cache missing accesses are more critical than some other data uses.

Data value profiling also seems to be less well-developed. In some cases, if certain types of values are more common, significant strength reductions can be performed, improving performance even with additional overhead in checking the values.
 Next Post in Thread >
TopicPosted ByDate
Beyond hot and cold code?Paul A. Clayton2019/12/17 02:03 PM
  Beyond hot and cold code?anon2019/12/17 06:48 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?