By: Exophase (exophase.delete@this.gmail.com), August 10, 2014 9:03 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 10, 2014 9:15 pm wrote:
> Saving 5-10% on 25-35% of the overall chip isn't very impressive or important.
Sure, if you're just looking at area. How about efficiency? CPU cores will have much higher average power density than any part of the uncore, including higher level caches and much of the rest of an SoC, perhaps outside of the GPU.
The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that are substantially larger than instructions. It does however simplify decoding even over ARM64 (where I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
> Saving 5-10% on 25-35% of the overall chip isn't very impressive or important.
Sure, if you're just looking at area. How about efficiency? CPU cores will have much higher average power density than any part of the uncore, including higher level caches and much of the rest of an SoC, perhaps outside of the GPU.
The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that are substantially larger than instructions. It does however simplify decoding even over ARM64 (where I guess you'd say the uop decoding is aborbed into later stages of the pipeline)