Article: Impressions of Kepler
By: David Kanter (dkanter.delete@this.realworldtech.com), April 9, 2012 11:29 am
Room: Moderated Discussions
Gary M. (gary83@yahoo.com) on 4/6/12 wrote:
---------------------------
>Oscar Eddington (oscare@gmail.com) on 3/28/12 wrote:
>> If Nvidia's graphics and compute products diverge
>> as explained in this article, doesn't that make
>> it difficult to use both a graphics card and a
>> tesla card in a workstation for compute?
>
>It will be nearly impossible to keep both a graphics card and a Tesla card running
>at full speed if these cards have different capabilities. It is too difficult to
>do load balancing across chips with different capabilities. Nvidia's strategy to
>diverge their graphics and compute products seems similar to the transition from
>heterogeneous pixel shaders and vertex shaders to unified >shaders except it's going in the opposite direction.
It's not really opposite at all. Conceptually, you can thihnk of hardware vertex or pixel shaders as heterogeneous cores within a single GPU. That's not at all what's happening.
What you see is different cores for different GPU (e.g. different double precision) and also different memory controllers (e.g. ECC support).
In the CPU world, we are used to seeing the same core, but different 'uncore'.
>Some amount of bandwidth per flop to shared data is certainly needed for GPU computing
>but I don't see how Mr. Kanter concludes or has the impression that 1 byte/flop
>is enough (Fermi) but .33 bytes/flop is not enough (Kepler). What is the basis for this conclusion/impression?
I'm not saying that it's 'enough', because it's entirely workload dependent. My point is that Nvidia radically reduced the shared data bandwidth, which hurts computational (rather than graphics) workloads. The point is that it's a trade-off in favor of graphics performance away from HPC.
>It seems to me that a key strength of Nvidia's Tesla line >was that the same software
>that ran on Tesla also ran on Nvidia's graphics cards.
That's certainly an element.
>Perhaps Nvidia realized
>that "dusty deck" Fortran code that uses double precision is better suited to Intel's
>MIC and Nvidia is going to focus on new codes designed to >use mostly single precision.
That's really not the case. You need double precision for many HPC workloads. It's just that the GK104 is meant for graphics, so there's no reason to pay the overhead of doing double precision (ditto for ECC) because there is no performance benefit.
>On the other hand, Intel's MIC is starting to smell like >another Itanic fiasco.
>Intel is already saying "We know the performance of our >current MIC chip stinks
>but just wait until our next version comes out." This is >exactly what Intel said
>about each generation of their Itanic microprocessor.
Perhaps. Itanium required porting software though. MIC inherits the x86 code base. Admittedly, it is functionally correct and will require work to get good performance, but that's still far better than other GPUs.
>I don't understand why vidia scrapped ECC (Error Correction >Code). ECC for on-chip
>memory takes under 5% of total chip area since only a >fraction of GPU chip area
>is memory.
Because nobody cares about ECC in a consumer graphics part. It's pure overhead.
>Giving programmers the option to use ECC for DRAM (in exchange for DRAM
>space) costs next to nothing in GPU chip area. If the Quadro cards based on Kepler
>don't have ECC, I think Nvidia made a big mistake. Intel's >MIC has ECC.
I believe Quadro and Tesla will use the same chip, which will have ECC for SRAMs and DRAM, plus good double precision. However, what I'm saying in this article is that it won't be the same shader cores that are used in GK104. It's exactly like the Fermi generation. GF100 was a different core than GF104. One was meant for workstations and HPC, the other for graphics.
David
---------------------------
>Oscar Eddington (oscare@gmail.com) on 3/28/12 wrote:
>> If Nvidia's graphics and compute products diverge
>> as explained in this article, doesn't that make
>> it difficult to use both a graphics card and a
>> tesla card in a workstation for compute?
>
>It will be nearly impossible to keep both a graphics card and a Tesla card running
>at full speed if these cards have different capabilities. It is too difficult to
>do load balancing across chips with different capabilities. Nvidia's strategy to
>diverge their graphics and compute products seems similar to the transition from
>heterogeneous pixel shaders and vertex shaders to unified >shaders except it's going in the opposite direction.
It's not really opposite at all. Conceptually, you can thihnk of hardware vertex or pixel shaders as heterogeneous cores within a single GPU. That's not at all what's happening.
What you see is different cores for different GPU (e.g. different double precision) and also different memory controllers (e.g. ECC support).
In the CPU world, we are used to seeing the same core, but different 'uncore'.
>Some amount of bandwidth per flop to shared data is certainly needed for GPU computing
>but I don't see how Mr. Kanter concludes or has the impression that 1 byte/flop
>is enough (Fermi) but .33 bytes/flop is not enough (Kepler). What is the basis for this conclusion/impression?
I'm not saying that it's 'enough', because it's entirely workload dependent. My point is that Nvidia radically reduced the shared data bandwidth, which hurts computational (rather than graphics) workloads. The point is that it's a trade-off in favor of graphics performance away from HPC.
>It seems to me that a key strength of Nvidia's Tesla line >was that the same software
>that ran on Tesla also ran on Nvidia's graphics cards.
That's certainly an element.
>Perhaps Nvidia realized
>that "dusty deck" Fortran code that uses double precision is better suited to Intel's
>MIC and Nvidia is going to focus on new codes designed to >use mostly single precision.
That's really not the case. You need double precision for many HPC workloads. It's just that the GK104 is meant for graphics, so there's no reason to pay the overhead of doing double precision (ditto for ECC) because there is no performance benefit.
>On the other hand, Intel's MIC is starting to smell like >another Itanic fiasco.
>Intel is already saying "We know the performance of our >current MIC chip stinks
>but just wait until our next version comes out." This is >exactly what Intel said
>about each generation of their Itanic microprocessor.
Perhaps. Itanium required porting software though. MIC inherits the x86 code base. Admittedly, it is functionally correct and will require work to get good performance, but that's still far better than other GPUs.
>I don't understand why vidia scrapped ECC (Error Correction >Code). ECC for on-chip
>memory takes under 5% of total chip area since only a >fraction of GPU chip area
>is memory.
Because nobody cares about ECC in a consumer graphics part. It's pure overhead.
>Giving programmers the option to use ECC for DRAM (in exchange for DRAM
>space) costs next to nothing in GPU chip area. If the Quadro cards based on Kepler
>don't have ECC, I think Nvidia made a big mistake. Intel's >MIC has ECC.
I believe Quadro and Tesla will use the same chip, which will have ECC for SRAMs and DRAM, plus good double precision. However, what I'm saying in this article is that it won't be the same shader cores that are used in GK104. It's exactly like the Fermi generation. GF100 was a different core than GF104. One was meant for workstations and HPC, the other for graphics.
David
Topic | Posted By | Date |
---|---|---|
First impressions of Nvidia's Kepler | David Kanter | 2012/03/22 06:00 PM |
First impressions of Nvidia's Kepler | fellix | 2012/03/23 01:25 AM |
First impressions of Nvidia's Kepler | Mike | 2012/03/23 08:24 AM |
First impressions of Nvidia's Kepler | David Kanter | 2012/03/23 09:02 AM |
First impressions of Nvidia's Kepler | Mike | 2012/03/23 09:34 AM |
First impressions of Nvidia's Kepler | David Kanter | 2012/03/23 12:15 PM |
First impressions of Nvidia's Kepler | anon | 2012/03/23 11:37 AM |
I use ALUs | Mark Roulo | 2012/03/23 12:59 PM |
I use ALUs | anon | 2012/03/23 02:07 PM |
I use ALUs | Mark Roulo | 2012/03/23 03:12 PM |
I use ALUs | anon | 2012/03/23 04:08 PM |
Makes no sense... | EduardoS | 2012/03/23 05:30 PM |
Makes no sense... | anon | 2012/03/23 06:14 PM |
Makes no sense... | David Kanter | 2012/03/25 10:45 AM |
Makes no sense... | fellix | 2012/03/24 05:41 AM |
Comparing against the 560 | Cat | 2012/03/26 08:51 AM |
Comparing against the 560 | David Kanter | 2012/03/26 09:24 AM |
Shuffle Instruction | Martin | 2012/03/27 06:17 AM |
Shuffle Instruction | David Kanter | 2012/03/27 08:47 AM |
Shuffle Instruction | Martin | 2012/03/27 10:52 AM |
.msi unarchiver? | hobold | 2012/03/28 10:20 AM |
.msi unarchiver? | Joe | 2012/03/28 12:55 PM |
.msi unarchiver? | Martin | 2012/03/29 12:53 AM |
Shuffle Instruction | Rohit | 2012/03/27 12:04 PM |
Workgroups vs. warps/wavefronts | Andrew McDonald | 2012/03/28 02:31 PM |
Workgroups vs. warps/wavefronts | David Kanter | 2012/03/28 03:14 PM |
Workgroups vs. warps/wavefronts | Rohit | 2012/03/28 08:53 PM |
Workgroups vs. warps/wavefronts | Lee Howes | 2012/03/29 06:38 AM |
Threads | David Kanter | 2012/04/09 11:36 AM |
Fixed (NT) | David Kanter | 2012/04/09 11:37 AM |
Heterogeneous GPUs | Oscar Eddington | 2012/03/28 07:41 PM |
Heterogeneous GPUs | Gary M. | 2012/04/06 04:35 PM |
Different shader cores | David Kanter | 2012/04/09 11:29 AM |
Different shader cores | Tom | 2012/04/11 02:36 PM |
Nope... | David Kanter | 2012/04/12 01:10 AM |
Nope... | Tom | 2012/04/13 03:58 PM |
Nope... | David Kanter | 2012/04/14 12:24 PM |
Load balancing between Tesla and graphics boards | Tom | 2012/04/15 06:11 PM |
Load balancing between Tesla and graphics boards | David Kanter | 2012/04/15 10:11 PM |
Load balancing between Tesla and graphics boards | Anon | 2012/04/16 05:05 PM |
Why isn't AMD hardware used for GPU computing? | Richard G. | 2012/04/03 07:39 PM |
Why isn't AMD hardware used for GPU computing? | anon | 2012/04/04 03:11 AM |
Why isn't AMD hardware used for GPU computing? | Soupdragon | 2012/04/04 05:24 AM |
Why isn't AMD hardware used for GPU computing? | Groo | 2012/04/04 08:41 AM |
Why isn't AMD hardware used for GPU computing? | Michael S | 2012/04/04 06:24 AM |
Why isn't AMD hardware used for GPU computing? | Alexko | 2012/04/04 08:43 AM |
Why isn't AMD hardware used for GPU computing? | EduardoS | 2012/04/04 03:37 PM |
Why isn't AMD hardware used for GPU computing? | David Kanter | 2012/04/09 02:51 PM |
Why isn't AMD hardware used for GPU computing? | Ricardo B | 2012/04/04 12:57 PM |
Why isn't AMD hardware used for GPU computing? | Tom | 2012/04/04 05:36 PM |
Why isn't AMD hardware used for GPU computing? | Brett | 2012/04/04 06:55 PM |
Why isn't AMD hardware used for GPU computing? | David Kanter | 2012/04/09 02:55 PM |
Predictions about Kepler | Russell Baker | 2012/04/18 02:09 PM |
Predictions about Kepler | 0100010 | 2012/04/18 03:14 PM |
Predictions about Kepler | EduardoS | 2012/04/18 03:38 PM |
Predictions about Kepler | Anon | 2012/04/18 08:48 PM |
Predictions about Kepler | EduardoS | 2012/04/19 03:03 PM |
Predictions about Kepler | Meeps | 2012/04/19 03:39 PM |
Predictions about Kepler | John P. | 2012/04/18 06:13 PM |
Predictions about Kepler | Foo_ | 2012/04/19 12:15 PM |
Predictions about Kepler | EduardoS | 2012/04/19 03:07 PM |
Predictions about Kepler | Groo | 2012/04/19 09:13 AM |
Predictions about Kepler | anon | 2012/04/19 03:26 PM |
Predictions about Kepler | Groo | 2012/04/20 08:01 AM |
Predictions about Kepler | Alex L. | 2012/04/20 03:41 PM |
Predictions about Kepler | Anon | 2012/04/21 09:34 AM |
Predictions about Kepler | mpx | 2012/04/21 11:23 PM |
Predictions about Kepler | ac | 2012/04/22 01:49 AM |