By: Patrick Chase (patrickjchase.delete@this.gmail.com), September 1, 2013 3:14 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on September 1, 2013 12:54 pm wrote:
> [Stating the obvious:]
>
> For DSPs, the availability of ILP in the workloads and the importance of small loops
> (which allow substantial optimization effort to be worthwhile) is presumably also a
> significant factor. For some embedded/DSP uses, very tight real-time requirements
> would discourage substantial speculation (requiring static scheduling presumably
> favors VLIW).
You generally don't do hard-real-time from your VLIW DSP if you can avoid it. Lightweight control-oriented cores like Cortex R4 are more cost-effective for that stuff.
IMO the reasons for using a CPU/DSP (VLIW or otherwise) vs custom HW come down to flexibility (the ability to change product functionality after the SoC design is frozen) and low incremental design/development effort to add new features.
The reasons for using a VLIW as opposed to a SIMD engine or a superscalar CPU are twofold:
1. There are a nontrivial number of workloads out there that have significant ILP, but that aren't vectorizable. If you want to be able to "cover" those workloads then you need some sort of ILP machine.
2. DSP-ish workloads typically have high *static* ILP - They don't require runtime speculation, and so the speculation and OoO plumbing in a modern application processor is mostly wasted cost/area.
VLIWs are often the cheapest (lowest chip area) platform that address the workloads outlined above.
As Michael S has pointed out in another thread, a large number of embedded workloads are in fact vectorizable, and I think that's why there are a fair number of VLIW-of-SIMD solutions like Hexagon and the Icera processor. The added vector capabilities improve price/performance on vectorizable workloads, but without completely sacrificing the ability to handle ILP-intensive workloads.
> Embedded systems are also less likely to require support for JIT
> compilation, so more expensive compilation is less costly.
It isn't unheard of. http://llvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdf
> (I don't know if specialized software development--VLIW compiler--is less expensive
> than hardware development [though I would not be surprised if more people who have
> taken an introduction to compilers class than have taken an introduction to an HDL
> class], but compiler development may have a significant scheduling advantage [the
> time between design finalization and product shipment is smaller and updating a
> compiler after shipment of hardware might not be especially problematic].)
The development effort for a VLIW compiler is pretty significant, but that can be mitigated by using something like Pro64 as a starting point.
> (With respect to code density, I thought ST200 is close to a classic RISC
> with stop bits, i.e., not great code density. Having better code density than
> classic [fixed word length] VLIW or even Itanium is not a great boast.)
Yes, that's about right. It's good enough for many applications.
> [Stating the obvious:]
>
> For DSPs, the availability of ILP in the workloads and the importance of small loops
> (which allow substantial optimization effort to be worthwhile) is presumably also a
> significant factor. For some embedded/DSP uses, very tight real-time requirements
> would discourage substantial speculation (requiring static scheduling presumably
> favors VLIW).
You generally don't do hard-real-time from your VLIW DSP if you can avoid it. Lightweight control-oriented cores like Cortex R4 are more cost-effective for that stuff.
IMO the reasons for using a CPU/DSP (VLIW or otherwise) vs custom HW come down to flexibility (the ability to change product functionality after the SoC design is frozen) and low incremental design/development effort to add new features.
The reasons for using a VLIW as opposed to a SIMD engine or a superscalar CPU are twofold:
1. There are a nontrivial number of workloads out there that have significant ILP, but that aren't vectorizable. If you want to be able to "cover" those workloads then you need some sort of ILP machine.
2. DSP-ish workloads typically have high *static* ILP - They don't require runtime speculation, and so the speculation and OoO plumbing in a modern application processor is mostly wasted cost/area.
VLIWs are often the cheapest (lowest chip area) platform that address the workloads outlined above.
As Michael S has pointed out in another thread, a large number of embedded workloads are in fact vectorizable, and I think that's why there are a fair number of VLIW-of-SIMD solutions like Hexagon and the Icera processor. The added vector capabilities improve price/performance on vectorizable workloads, but without completely sacrificing the ability to handle ILP-intensive workloads.
> Embedded systems are also less likely to require support for JIT
> compilation, so more expensive compilation is less costly.
It isn't unheard of. http://llvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdf
> (I don't know if specialized software development--VLIW compiler--is less expensive
> than hardware development [though I would not be surprised if more people who have
> taken an introduction to compilers class than have taken an introduction to an HDL
> class], but compiler development may have a significant scheduling advantage [the
> time between design finalization and product shipment is smaller and updating a
> compiler after shipment of hardware might not be especially problematic].)
The development effort for a VLIW compiler is pretty significant, but that can be mitigated by using something like Pro64 as a starting point.
> (With respect to code density, I thought ST200 is close to a classic RISC
> with stop bits, i.e., not great code density. Having better code density than
> classic [fixed word length] VLIW or even Itanium is not a great boast.)
Yes, that's about right. It's good enough for many applications.