By: David Kanter (dkanter.delete@this.realworldtech.com), January 21, 2021 6:56 pm
Room: Moderated Discussions
Hey Folks,
Wanted to provide an update on the monster Tremont decoding thread.
Based on offline discussions with some Intel architects, they have confirmed my description...namely Tremont has 2x3 decoders that can (in some circumstances) simultaneously decode out-of-order from a single thread.
As folks have pointed out, for straight-line code with no branches...the decoding throughput would be limited to 3 x86 instructions per clock. When branches are carefully inserted (or already present), the front-end can sustain up to 6 x86 instructions per clock.
Thanks for reading and the great discussion :)
David
Wanted to provide an update on the monster Tremont decoding thread.
Based on offline discussions with some Intel architects, they have confirmed my description...namely Tremont has 2x3 decoders that can (in some circumstances) simultaneously decode out-of-order from a single thread.
As folks have pointed out, for straight-line code with no branches...the decoding throughput would be limited to 3 x86 instructions per clock. When branches are carefully inserted (or already present), the front-end can sustain up to 6 x86 instructions per clock.
Thanks for reading and the great discussion :)
David