By: Moritz (better.delete@this.not.tell), April 5, 2021 3:21 am
Room: Moderated Discussions
Any caching is in a way speculative. Only preloading data that has not been used recently is called so. But precalculating what data might be needed is the only way to increase single thread performance.
Some of that analysis can be done by the compiler or at least the compiler can form conditional statements about what might end up in the execution path. These conditional statements could then be processed at run-time by the outer-/preprocessor to prefetch based on the current information available.
Sometimes it is better to write a program that generates a program based on the data it is supposed to process which then does the calculations instead of processing conditionals and identity elements. If it is a lot of data and much of it is available before the execution starts then this form of control code generation should be beneficial. I know it is done to compute huge sparse matrices on CPUs.
Am I just externalizing the problem? Maybe, but I know that a software controlled and informed preprocessor can be more complex and specific with less area.
Am I just reconceptualizing, rephrasing what is already implemented? I do not know to what extend compiler generated cache control instructions and 'data based' / 'dynamic' code morphing are already used.
Some of that analysis can be done by the compiler or at least the compiler can form conditional statements about what might end up in the execution path. These conditional statements could then be processed at run-time by the outer-/preprocessor to prefetch based on the current information available.
Sometimes it is better to write a program that generates a program based on the data it is supposed to process which then does the calculations instead of processing conditionals and identity elements. If it is a lot of data and much of it is available before the execution starts then this form of control code generation should be beneficial. I know it is done to compute huge sparse matrices on CPUs.
Am I just externalizing the problem? Maybe, but I know that a software controlled and informed preprocessor can be more complex and specific with less area.
Am I just reconceptualizing, rephrasing what is already implemented? I do not know to what extend compiler generated cache control instructions and 'data based' / 'dynamic' code morphing are already used.