By: none (none.delete@this.none.com), August 11, 2014 1:35 am
Room: Moderated Discussions
Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 11, 2014 12:33 am wrote:
> Exophase (exophase.delete@this.gmail.com) on August 10, 2014 10:03 pm wrote:
> > The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> > to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> > equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> > that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> > are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> > I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
> >
>
> Post decode caches/loop caches have been proposed and evaluated for a variety
> of architectures and shown performance improvement in all of them, IIRC. Its
> a general mechanism to decouple fetch/decode from dispatch/execute.
Doesn't it exhibit more of a power gain than a performance gain?
> Exophase (exophase.delete@this.gmail.com) on August 10, 2014 10:03 pm wrote:
> > The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> > to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> > equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> > that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> > are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> > I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
> >
>
> Post decode caches/loop caches have been proposed and evaluated for a variety
> of architectures and shown performance improvement in all of them, IIRC. Its
> a general mechanism to decouple fetch/decode from dispatch/execute.
Doesn't it exhibit more of a power gain than a performance gain?