By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), July 29, 2022 8:17 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on July 28, 2022 4:29 am wrote:
[snip]
> Parts of what is .text on one architecture are .rodata on another.
> In theory, could be .data on the yet another.
Presumably, the ISA's support for immediate data will also matter. An ISA that supports full-size (e.g., 64-bit) immediates seems likely to have less constant data outside of the instruction stream (regardless of what section the compiler places read-only data).
There would presumably be cases where an all-constant structure is used as a valid instance. Even with store instructions with full-sized immediates, doing a memory copy may be more efficient (and denser) than a sequence of store-immediate instructions.
Even jump tables could be encoded as immediates (I think Mitch Alsup's My 66000 has a jump instruction that takes an index and includes a series of constants for the target). This would seem to introduce a size/speed tradeoff where inlining the target addresses is likely to be faster but also have larger code size (but the code size would not be bloated as much as implied if one excluded .rodata from the code size measure).
For a microcontroller, it seems that the size of the entire binary (which would presumably be stored in flash) would be the important measure not size of executable code. Even for microcontrollers there would presumably be speed tradeoffs with what gets loaded into SRAM. Optimizing strictly for size might not even be ideal; microcontrollers often have buffers in front of flash that benefit from streaming accesses and while indirection (function calls) is less expensive in a more flat memory system of a microcontroller it is not free and code expanding optimizations can help meet performance/cost requirements.
For servers, it seems that prefetchability and various locality factors would be more important than total size. Inlining could make sense not merely for execution performance but for memory system performance (stream prefetching is easy).
Energy per computation is also a consideration that complicates what one considers for code density.
As already mentioned, single figure of merit measures require context.
[snip]
> Parts of what is .text on one architecture are .rodata on another.
> In theory, could be .data on the yet another.
Presumably, the ISA's support for immediate data will also matter. An ISA that supports full-size (e.g., 64-bit) immediates seems likely to have less constant data outside of the instruction stream (regardless of what section the compiler places read-only data).
There would presumably be cases where an all-constant structure is used as a valid instance. Even with store instructions with full-sized immediates, doing a memory copy may be more efficient (and denser) than a sequence of store-immediate instructions.
Even jump tables could be encoded as immediates (I think Mitch Alsup's My 66000 has a jump instruction that takes an index and includes a series of constants for the target). This would seem to introduce a size/speed tradeoff where inlining the target addresses is likely to be faster but also have larger code size (but the code size would not be bloated as much as implied if one excluded .rodata from the code size measure).
For a microcontroller, it seems that the size of the entire binary (which would presumably be stored in flash) would be the important measure not size of executable code. Even for microcontrollers there would presumably be speed tradeoffs with what gets loaded into SRAM. Optimizing strictly for size might not even be ideal; microcontrollers often have buffers in front of flash that benefit from streaming accesses and while indirection (function calls) is less expensive in a more flat memory system of a microcontroller it is not free and code expanding optimizations can help meet performance/cost requirements.
For servers, it seems that prefetchability and various locality factors would be more important than total size. Inlining could make sense not merely for execution performance but for memory system performance (stream prefetching is easy).
Energy per computation is also a consideration that complicates what one considers for code density.
As already mentioned, single figure of merit measures require context.