By: Peter Lewis (peter.delete@this.notyahoo.com), June 4, 2022 6:56 pm
Room: Moderated Discussions
The performance of filling the µop cache still matters, which is why Intel recently increased the number of instructions decoded per clock from 4 to 6. If the µop cache was big enough to hold all the performance critical code, there would be no need to do that. For the same reason you can’t increase the bandwidth out of a data cache without also increasing DRAM bandwidth, you can’t increase the bandwidth out of a µop cache without increasing the bandwidth into it. Maybe you are thinking the µop cache will eventually become so big that the hit rate will approach 100%. I don’t know if that is possible. The code size of every type of software seems to grow without limit, but maybe the amount of code that needs to be in cache at one time does have some limit.
> If a RISC CPU has a µop cache, the space-efficiency of the µop cache can be better than the space-efficiency of RISC instructions
If it is possible to make the encoding in the µop cache more space efficient than the encoding of RISC instructions, why didn’t the RISC processor use the µops as its instruction set?
> It is possible that in the near future it will become clear that CPU performance is determined by the number of [conditional] branches successfully predicted per cycle
I agree accurately predicting multiple branches per cycle is one of the most important factors for CPU performance as the number of instructions processed per cycle increases.
> If a RISC CPU has a µop cache, the space-efficiency of the µop cache can be better than the space-efficiency of RISC instructions
If it is possible to make the encoding in the µop cache more space efficient than the encoding of RISC instructions, why didn’t the RISC processor use the µops as its instruction set?
> It is possible that in the near future it will become clear that CPU performance is determined by the number of [conditional] branches successfully predicted per cycle
I agree accurately predicting multiple branches per cycle is one of the most important factors for CPU performance as the number of instructions processed per cycle increases.