By: Exophase (exophase.delete@this.gmail.com), May 20, 2013 8:56 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on May 20, 2013 9:26 pm wrote:
> This one is more interesting, in that it may or may not be true depending on the relationship between static/leakage
> and dynamic power. On TSMC's 40 nm and initial 28 nm (non-LP/HPM) processes leakage could easily approach
> 50% of total dissipation. This meant that "always-on" blocks like caches and OoO state structures would
> dissipate significant power even doing nothing, and so idling along at 5-10% wasn't such a great thing.
> That's why big.LITTLE looked like such a big win, since AFAIK it was the only (mostly) SW-transparent technique
> that enabled you to power down your big, high-performance cores in their entirety.
>
> If you look at 28LP/HPM (the high-K 28 nm flavors), the tradeoffs appear to change rather significantly.
> The leakage power is now down by an order of magnitude at constant performance, and the ratio of dynamic:static
> power is about 5X greater (working from memory here so these may not be exact). A big core with really
> good DVFS and clock-gating is therefore a vastly more competitive option, since the relative benefit
> of power-gating all of those transistors is about a fifth of what it used to be.
>
Not sure about the 40nm non-LP processes - AFAIK everyone was using LP except for nVidia who only used it for the companion core on Tegra 3. Very true that there's been a big push for reducing static leakage for SoC processes. Samsung is using some additional reverse body-biasing techniques to further reduce leakage in their 32/28nm processes.
An important question is, if you need a small amount of CPU time for video what're the latency requirements like? At worst it should be no less than one whole frame at a time that the CPU can deal with. If the CPU can idle for long enough then all of the cores can be power gated for much of that time, at which point even static power consumption won't matter. You won't need something like big.LITTLE for this, the CPUidle drivers for Linux are starting to handle this properly.
Whatever the case may be, we're already seeing phones > 10 hours on video playbak tests even with their ~5" 1080p screens. This will vary depending on what sort of video you're playing, of course, but it's a common story for tests:
http://blog.gsmarena.com/samsung-i9505-galaxy-s4-battery-tests-update-battery-life-improved/
We're getting to the point where there's very little that can be saved additionally with a more efficient CPU. Even putting the video decode on a better process will barely matter, most of the power consumption has to be from the screen, at least in these tests.
Gaming is a very different story, it's normal to hear top tier games draining a fully charged phone in even under 2 hours. Typically most of this will be from the GPU, which even in phones can now suck more than 4W under full load. Intel could stand to make a big impact here with their process advantage. But first they need to be willing to use the best GPUs available and spend a large amount of their transistor budget on it. Baytrail is supposed to be using a Gen 7 variant, which I doubt will even hit iPad 4 level so will probably easily by beaten by whatever Apple has in their next tablets. Rumor is that Merrifield will still use IMG IP. Don't know what it'll consist of but probably substantially weaker than whatever they put in Baytrail, so I'm not that enthusiastic.
> This may explain Qualcomm's focus on dynamic power reduction techniques in Krait, and their choice not
> to pursue big.LITTLE and n+1 designs. Given the way the process technologies have played out (and now
> that they're on HKMG) that seems to have been a very wise bet. Intel has been on HKMG all along and
> has (perhaps unsurprisingly) pursued exactly the same strategy with Saltwell and now Silvermont.
>
Qualcomm has actually not yet released an SoC implemented with HKMG. S800 will be the first to do so (and is what will enable the clock boost from 1.9GHz to 2.3GHz). They're already doing pretty well with power efficiency so this can only help, plus they're supposed to be bringing more improvements to the CPU uarch, I think.
> This one is more interesting, in that it may or may not be true depending on the relationship between static/leakage
> and dynamic power. On TSMC's 40 nm and initial 28 nm (non-LP/HPM) processes leakage could easily approach
> 50% of total dissipation. This meant that "always-on" blocks like caches and OoO state structures would
> dissipate significant power even doing nothing, and so idling along at 5-10% wasn't such a great thing.
> That's why big.LITTLE looked like such a big win, since AFAIK it was the only (mostly) SW-transparent technique
> that enabled you to power down your big, high-performance cores in their entirety.
>
> If you look at 28LP/HPM (the high-K 28 nm flavors), the tradeoffs appear to change rather significantly.
> The leakage power is now down by an order of magnitude at constant performance, and the ratio of dynamic:static
> power is about 5X greater (working from memory here so these may not be exact). A big core with really
> good DVFS and clock-gating is therefore a vastly more competitive option, since the relative benefit
> of power-gating all of those transistors is about a fifth of what it used to be.
>
Not sure about the 40nm non-LP processes - AFAIK everyone was using LP except for nVidia who only used it for the companion core on Tegra 3. Very true that there's been a big push for reducing static leakage for SoC processes. Samsung is using some additional reverse body-biasing techniques to further reduce leakage in their 32/28nm processes.
An important question is, if you need a small amount of CPU time for video what're the latency requirements like? At worst it should be no less than one whole frame at a time that the CPU can deal with. If the CPU can idle for long enough then all of the cores can be power gated for much of that time, at which point even static power consumption won't matter. You won't need something like big.LITTLE for this, the CPUidle drivers for Linux are starting to handle this properly.
Whatever the case may be, we're already seeing phones > 10 hours on video playbak tests even with their ~5" 1080p screens. This will vary depending on what sort of video you're playing, of course, but it's a common story for tests:
http://blog.gsmarena.com/samsung-i9505-galaxy-s4-battery-tests-update-battery-life-improved/
We're getting to the point where there's very little that can be saved additionally with a more efficient CPU. Even putting the video decode on a better process will barely matter, most of the power consumption has to be from the screen, at least in these tests.
Gaming is a very different story, it's normal to hear top tier games draining a fully charged phone in even under 2 hours. Typically most of this will be from the GPU, which even in phones can now suck more than 4W under full load. Intel could stand to make a big impact here with their process advantage. But first they need to be willing to use the best GPUs available and spend a large amount of their transistor budget on it. Baytrail is supposed to be using a Gen 7 variant, which I doubt will even hit iPad 4 level so will probably easily by beaten by whatever Apple has in their next tablets. Rumor is that Merrifield will still use IMG IP. Don't know what it'll consist of but probably substantially weaker than whatever they put in Baytrail, so I'm not that enthusiastic.
> This may explain Qualcomm's focus on dynamic power reduction techniques in Krait, and their choice not
> to pursue big.LITTLE and n+1 designs. Given the way the process technologies have played out (and now
> that they're on HKMG) that seems to have been a very wise bet. Intel has been on HKMG all along and
> has (perhaps unsurprisingly) pursued exactly the same strategy with Saltwell and now Silvermont.
>
Qualcomm has actually not yet released an SoC implemented with HKMG. S800 will be the first to do so (and is what will enable the clock boost from 1.9GHz to 2.3GHz). They're already doing pretty well with power efficiency so this can only help, plus they're supposed to be bringing more improvements to the CPU uarch, I think.