By: nope (no.delete@this.no.no), August 11, 2014 2:05 am
Room: Moderated Discussions
none (none.delete@this.none.com) on August 11, 2014 1:35 am wrote:
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 11, 2014 12:33 am wrote:
> > Exophase (exophase.delete@this.gmail.com) on August 10, 2014 10:03 pm wrote:
> > > The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> > > to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> > > equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> > > that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> > > are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> > > I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
> > >
> >
> > Post decode caches/loop caches have been proposed and evaluated for a variety
> > of architectures and shown performance improvement in all of them, IIRC. Its
> > a general mechanism to decouple fetch/decode from dispatch/execute.
>
> Doesn't it exhibit more of a power gain than a performance gain?
Or, a power reduction, since you can idle fetch and decode logic (and associated array-access) while driving instructions from the buffer.
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 11, 2014 12:33 am wrote:
> > Exophase (exophase.delete@this.gmail.com) on August 10, 2014 10:03 pm wrote:
> > > The presence of the uop cache on SB onward is a strong indication that Intel thinks there's something
> > > to in circumventing decode of x86 instructions, even at the expense of more area. This doesn't totally
> > > equalize things either, since the uop cache has a more complex lookup mechanism translating fetch addresses
> > > that aren't a 1:1 mapping into it, and because it needs more fetch bandwidth to accommodate uops that
> > > are substantially larger than instructions. It does however simplify decoding even over ARM64 (where
> > > I guess you'd say the uop decoding is aborbed into later stages of the pipeline)
> > >
> >
> > Post decode caches/loop caches have been proposed and evaluated for a variety
> > of architectures and shown performance improvement in all of them, IIRC. Its
> > a general mechanism to decouple fetch/decode from dispatch/execute.
>
> Doesn't it exhibit more of a power gain than a performance gain?
Or, a power reduction, since you can idle fetch and decode logic (and associated array-access) while driving instructions from the buffer.