By: anon (anon.delete@this.anon.com), August 10, 2014 8:50 pm
Room: Moderated Discussions
Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 9, 2014 3:54 am wrote:
> anon (anon.delete@this.anon.com) on August 9, 2014 12:12 am wrote:
> > In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the
> > fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide,
> > while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find
> > it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
> >
>
> Using width of decode of existing devices to determine how hard decoding is, isn't
> necessarily supportive. Granted, I think x86 decode is worse, the evidence you are
> using to support your argument doesn't really say anything about your argument.
It is reasonable circumstantial evidence, when you look at a wide selection of devices.
Intel has clearly had a history of struggling with decode, that no non-x86 designs (ignoring exotic or ancient ones I don't know much about, like mainframe or vax) have had problems with.
Pentium4 - invested vast efforts into incredibly complex and ultimately failed trace cache in order to have just a serial x86 decoder.
1st Atom core - 2-way decoder on SMT2 device! ARM contemporaries had 2-3x the decode bandwidth per thread.
Yonah - decoded loop buffer far earlier than others. Most x86 designs now include something like this, including ones from AMD. Relatively few other ISA implementations seem to. ARM A15 was the first ARM to have smilar feature, but that's not even a highly regarded CPU in terms of efficiency, as far as ARM's track record goes. A12 and seemingly A17, which are the power efficient range, do not.
uop cache - complex additional instruction cache layer to avoid fetch and decoders. Been in Intel higher performance cores for many years, but not yet an indication of non-x86 implementations using them. Intel has never had wider than 3-way decode without some kind of decoded instruction caching. The old chestnut that non-x86 high end devices require a nuclear power plant to run is, of course, not true for many years now either.
It's not hard evidence, but it beats handwaving. I'm not a chip designer, so I have no credibility or position to say whether x86 decoding is difficult based on experience, so I look at other evidence.
> anon (anon.delete@this.anon.com) on August 9, 2014 12:12 am wrote:
> > In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the
> > fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide,
> > while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find
> > it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
> >
>
> Using width of decode of existing devices to determine how hard decoding is, isn't
> necessarily supportive. Granted, I think x86 decode is worse, the evidence you are
> using to support your argument doesn't really say anything about your argument.
It is reasonable circumstantial evidence, when you look at a wide selection of devices.
Intel has clearly had a history of struggling with decode, that no non-x86 designs (ignoring exotic or ancient ones I don't know much about, like mainframe or vax) have had problems with.
Pentium4 - invested vast efforts into incredibly complex and ultimately failed trace cache in order to have just a serial x86 decoder.
1st Atom core - 2-way decoder on SMT2 device! ARM contemporaries had 2-3x the decode bandwidth per thread.
Yonah - decoded loop buffer far earlier than others. Most x86 designs now include something like this, including ones from AMD. Relatively few other ISA implementations seem to. ARM A15 was the first ARM to have smilar feature, but that's not even a highly regarded CPU in terms of efficiency, as far as ARM's track record goes. A12 and seemingly A17, which are the power efficient range, do not.
uop cache - complex additional instruction cache layer to avoid fetch and decoders. Been in Intel higher performance cores for many years, but not yet an indication of non-x86 implementations using them. Intel has never had wider than 3-way decode without some kind of decoded instruction caching. The old chestnut that non-x86 high end devices require a nuclear power plant to run is, of course, not true for many years now either.
It's not hard evidence, but it beats handwaving. I'm not a chip designer, so I have no credibility or position to say whether x86 decoding is difficult based on experience, so I look at other evidence.