Nvidia’s Kal-El Goes Asymmetric

Pages: 1 2

Kal-El Analysis

Nvidia’s approach is an architectural trade-off that offers some clear benefits, especially given the constraints. Moving Kal-El’s main cores to TSMC 40LP would significantly hurt frequency, perhaps around 500MHz. The resulting impact on overall performance, compared to both the competition and the previous generation Tegra 2, is unacceptable. The companion core cleverly maintains performance per core, but improves idle power. It should narrow, but not eliminate, the power gap with competitors using 40nm low-leakage processes. The power consumption benefits largely depend on how well the system can identify low intensity workloads and stay resident on the companion core. In turn, this depends on the pattern of user interactions, as frequent and short compute intense workloads will burn considerable idle power on the main cores. Nvidia is claiming a 15-60% reduction in power for various standby tasks (compared to Tegra 2), and 10-25% seems like a reasonable expectation.

Kal-El is the first quad-core ARM targeted for mobile devices, giving Nvidia some nice bragging rights. The performance benefits are complicated by the software picture though. Many consumer applications simply cannot use multiple cores and only respond to single threaded performance. For consumer devices, 1-2 high performance cores are infinitely preferable to many slower cores. Comparing Intel’s success with dual and quad-core or Nehalem variants to AMD’s 4 and 6-core versions of Barcelona clearly emphasizes this point in the context of the PC market. Ignoring the quad-core aspect, Kal-El’s main cores should still have modestly better performance and power efficiency for 1-2 core workloads compared to 40nm alternatives that rely on an LP process. For workloads with 4 threads, the performance will be even higher and a more significant advantage.

Perhaps most importantly, Nvidia’s choices minimized the engineering resources and time to market for Kal-El by focusing on relatively simple and incremental improvements. The companion core is probably a hard macro that is readily available from TSMC and ARM. So the major effort for the companion core is really the software and hardware integration and additional power gating. TSMC’s 28nm will enter production relatively soon and Nvidia had to balance any investments in older 40nm designs again the benefits of focusing on future products. This is particularly important because Nvidia tends to adopt new manufacturing nodes 3-6 months behind other TSMC customers. Competitors such as Qualcomm and Samsung are setting a very aggressive pace and Nvidia wants to avoid facing a market where competitors have a manufacturing advantage.

Another more subtle benefit for Kal-El’s simple approach is strategic risk management. TSMC’s 40nm ramp encountered significant yield and power challenges, which caused general delays and in the specific case of Nvidia’s Fermi, substantially lower performance as cores were disabled and frequency was reduced. Variation is a more significant problem at 28nm, so the risk of similar ramping problems is certainly present. By refreshing the product line, Kal-El is a hedge against potential problems with TSMC’s 28nm process. If delays do occur, Nvidia will be the least impacted and might even benefit slightly.

Kal-El is unquestionably an improvement for Nvidia, although there are benefits being left on the table due to time to market concerns. For example, more design effort should reduce the switch latency below 2ms. The A9 companion could be replaced with a fully compatible, but much simpler core. However, a truly heterogeneous design would be far more complex. The companion core is a clever approach to reduce idle power 10-25% and a good compromise between efficiency and internal resource constraints. At a high level, Kal-El is a calculated trade-off that increases costs to achieve lower idle power and higher performance for multi-core workloads compared to Tegra 2.


While the tablet market is still quite nascent, Apple is clearly the leader. Nvidia seems to be established as a solid #2 player; although clearly with some competition from TI (the OMAP4430 is used in the new Kindle Fire). Kal-El is a good fit here, in part because the software seems to be more multi-core friendly (e.g. more games). Shifting to a quad-core design (ahead of the iPad) and addressing the idle power issues will strengthen Nvidia’s position considerably. There is also a time to market advantage, since the 28nm TI OMAP5 and Qualcomm Snapdragon are probably 3-6 months behind, giving Nvidia a nice window of opportunity.

When it comes to smart phones though, the picture is less sanguine for Kal-El. First of all, quad-cores are too power hungry for phones and offer relatively minimal benefits because the ecosystem has little experience with multi-threading. The market is overwhelming single core today, with only 14% of smart phones using a dual-core in 2Q11. The market is dominated by entrenched competitors such as Qualcomm, TI and Samsung, although Nvidia has a small presence. The dual-core Kal-El will be far more competitive than Tegra 2 ever was, mostly due to improvements in standby power. However, there are no overwhelming advantages for Nvidia. The most likely scenario is that Kal-El will garner more design wins, primarily in low-risk phones rather than a flagship product like the Droid 3.

In summary, Kal-El is a very significant incremental improvement to Nvidia’s SoC family. Nvidia made the right trade-offs to increase their costs, but reduce standby power. Kal-El stands out an excellent product for tablets and will do well in that market. However, the incremental nature of the changes means that any progress in smartphones will be similarly incremental. For the next year and a half, Nvidia will primarily be using design wins to build confidence with customers and partners. However, it will take further improvements for Nvidia to begin seriously competing with Qualcomm and others for high volume smart phone designs, which we will expect will occur in the 2013 time frame.

Pages: « Prev  1 2  

Discuss (2 comments)