By: Aaron Spink (aaronspink.delete@this.notearthlink.net), August 11, 2014 12:33 am
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on August 10, 2014 10:03 pm wrote:
> The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
>
Post decode caches/loop caches have been proposed and evaluated for a variety of architectures and shown performance improvement in all of them, IIRC. Its a general mechanism to decouple fetch/decode from dispatch/execute.
> The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
>
Post decode caches/loop caches have been proposed and evaluated for a variety of architectures and shown performance improvement in all of them, IIRC. Its a general mechanism to decouple fetch/decode from dispatch/execute.