By: nona (a.delete@this.b.e), May 7, 2013 11:25 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on May 7, 2013 4:53 am wrote:
> I'm surprised because A15 has a small loop buffer, even though most people seem to say that ARM decoders
> should be simpler than x86.
The loop buffer is there to reduce fetches. Instruction fetch takes a lot of power, in a low-power environment you want to fetch as little as you can. (In fact, all of AMD, Intel, and ARM use loop buffers of some kind). However, one you have a loop buffer in a fixed-size instruction environment, the next thing to do is to ask, which parts of the decode can be moved before the buffer without sacrificing performance? And the answer is most of them. The gains from this are much lesser than the gains from the reduction of fetches, however, they are essentially free.
> I'm surprised because A15 has a small loop buffer, even though most people seem to say that ARM decoders
> should be simpler than x86.
The loop buffer is there to reduce fetches. Instruction fetch takes a lot of power, in a low-power environment you want to fetch as little as you can. (In fact, all of AMD, Intel, and ARM use loop buffers of some kind). However, one you have a loop buffer in a fixed-size instruction environment, the next thing to do is to ask, which parts of the decode can be moved before the buffer without sacrificing performance? And the answer is most of them. The gains from this are much lesser than the gains from the reduction of fetches, however, they are essentially free.