By: Hugo Décharnes (hdecharn.delete@this.outlook.fr), March 20, 2021 12:28 pm
Room: Moderated Discussions
For x86, yes it was helpful. 8 registers (even not general purpose) was too few. AArch32 consists of about thirty GP registers in total, which elegantly remapped to AArch64 32 registers.
The performance and code density gain from 16 to 32 registers is smaller. But this has a serious cost in out-of-order CPUs in the rename tables: the more registers, the higher the area and timings. For those kind of cores that are optimized for control-centric applications, one has to balance having more rename lanes or more architectural registers. Renaming bandwidth usually wins as the core is refilled quicker of useful instructions after flush. There is however a point of diminishing return that must be assessed through benchmarks.
Note aside, the register count does not have to be a power of two. One could architecture, say, 20 GP registers.
The performance and code density gain from 16 to 32 registers is smaller. But this has a serious cost in out-of-order CPUs in the rename tables: the more registers, the higher the area and timings. For those kind of cores that are optimized for control-centric applications, one has to balance having more rename lanes or more architectural registers. Renaming bandwidth usually wins as the core is refilled quicker of useful instructions after flush. There is however a point of diminishing return that must be assessed through benchmarks.
Note aside, the register count does not have to be a power of two. One could architecture, say, 20 GP registers.