By: Brett (ggtgp.delete@this.yahoo.com), September 2, 2013 2:24 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on September 1, 2013 10:38 am wrote:
> Brett (ggtgp.delete@this.yahoo.com) on September 1, 2013 2:03 am wrote:
> > Just bought two books on MultiFlow, as I did not study that CPU and it now looks
> > important.
>
> The sidebar commentary from Bob Colwell is a riot.
Looking forward to it.
> > Also not a fan of VLIW as implemented by some, huge instructions are just not
> > efficient.
>
> Most real VLIWs (including the VLX/ST architecture from the book) use stop bits.
Arn't stop bits just a waste of opcode space on a high end design, you have to map the dependancies anyway when going OoO. One of (many) mistakes of the Itanic design was trying to keep instruction bundles together, which they gave up on in the last design that actually had good performance. I understand that the AMD K8 also did some three at a time bundling, and that has been abandoned with the Bulldozer generations.
Compile time optimizations just do not have enough information of what is going on at run time, and run time lacks compile time information.
> Code density is not what caused
> VLIW to fail in the general-purpose space. Lack of inter-generational binary compatibility and sensitivity to
> compiler quality were bigger issues. Probably the biggest issue of all is the level of developer skill required
> to get good performance (for SW prefetch, alias disambiguation, etc). All of these issues are somewhat mitigated
> in the embedded/DSP domain, and I imagine that's why we still see a lot of VLIWs down there.
Console FPU/math units (and GPU's) have gone from hardcoded in the PS1 to semi-hard in the PS2, to mostly soft in the PS3. Am comfortable writing code for all of them.
Sony made the best choice at each generation, PS1 enabled cheap 3D before anyone else, PS2 enabled good 3D at a cheap price, PS3 had the best performance/FLOPS. (But was costly to develop for.)
> Brett (ggtgp.delete@this.yahoo.com) on September 1, 2013 2:03 am wrote:
> > Just bought two books on MultiFlow, as I did not study that CPU and it now looks
> > important.
>
> The sidebar commentary from Bob Colwell is a riot.
Looking forward to it.
> > Also not a fan of VLIW as implemented by some, huge instructions are just not
> > efficient.
>
> Most real VLIWs (including the VLX/ST architecture from the book) use stop bits.
Arn't stop bits just a waste of opcode space on a high end design, you have to map the dependancies anyway when going OoO. One of (many) mistakes of the Itanic design was trying to keep instruction bundles together, which they gave up on in the last design that actually had good performance. I understand that the AMD K8 also did some three at a time bundling, and that has been abandoned with the Bulldozer generations.
Compile time optimizations just do not have enough information of what is going on at run time, and run time lacks compile time information.
> Code density is not what caused
> VLIW to fail in the general-purpose space. Lack of inter-generational binary compatibility and sensitivity to
> compiler quality were bigger issues. Probably the biggest issue of all is the level of developer skill required
> to get good performance (for SW prefetch, alias disambiguation, etc). All of these issues are somewhat mitigated
> in the embedded/DSP domain, and I imagine that's why we still see a lot of VLIWs down there.
Console FPU/math units (and GPU's) have gone from hardcoded in the PS1 to semi-hard in the PS2, to mostly soft in the PS3. Am comfortable writing code for all of them.
Sony made the best choice at each generation, PS1 enabled cheap 3D before anyone else, PS2 enabled good 3D at a cheap price, PS3 had the best performance/FLOPS. (But was costly to develop for.)