By: John Nagle (nagle.delete@this.animats.com), March 14, 2008 10:55 am
Room: Moderated Discussions
First, we're just about at the limits of single CPU speed. In the last ten years, we've gone from 300 MHz clocks to 3GHz clocks. That growth is over. We're not going to see 30GHz clocks in the next ten years, at least not in anything commercial.
There's an NSA effort (http://www.nitrd.gov/pubs/nsa/sta.pdf) to get clock rates up to the 50-100GHz level, and it's worth reading that paper to get an sense of what's required to increase clock rates from current levels. They're proposing huge supercomputers that use Rapid Single Flux Quantum devices, superconducting components, and run in liquid helium at 4 degrees Kelvin. Cooling gear struggles to keep the temperature down near absolute zero as the unit generates a kilowatt of heat inside the liquid helium tank. Half a million data streams at 50 Gbps each are required between the superconducting processors and room-temperature SRAM.
There may be some desperate military need for this. Its' not a technology that will take over the commercial data center world, let alone the desktop.
So we're stuck with a technology that gets smaller and cheaper, but not much faster.
This forces us in a few directions. We only have three ideas that really work: clusters, shared memory multiprocessors, and massively parallel graphics processors. All the other architectures that have been tried, from the nCube to the Connection Machine to the BBN Butterfly, have been duds. Even the Cell, the only exotic multiprocessor to go into mass production, has been a disappointment.
The future of personal computing may be cheaper and lighter portable devices, not more powerful ones. We should not assume that personal computers will continue to increase in power. We will see all that chip real estate used for memory, not more CPUs, to produce cheap single-chip computers. There will be more $100 laptops than 100-CPU laptops.
In larger machines, it's worth thinking about how to organize tightly coupled clusters better. We probably are going to get clusters on a chip for servers. At some point (probably around 20 CPUs, from SGI's experience) the scaling problems of shared memory multiprocessors get in the way, and it's better to have a separate address space. So we need to think about how to make inter-computer communication for such systems better. We need interprocess messaging that's supported by hardware to make it efficient without violating protection boundaries. We need operating system support for such messaging. Using XML or JSON over TCP/IP over simulated Ethernet to pass a message to another cluster on the same chip is not a good answer. Neither is shared memory, which leads to "one fails, they all fail" clustering. What programmers usually want in interprocess communication is a subroutine call; what the OS usually gives them is an I/O operation. Code is layered on top to work around that mismatch, inefficiently. We need to get past that. (Look at Infiniband, and at QNX messaging for successes.)
The other interesting area of work is graphics-type processors, which are turning into useful number crunching engines. I've been to talks at Stanford on compiling Matlab into graphics engine code. That may be the future of number crunching. It's definitely the past and future of game machines, which have almost always had most of their crunch power in the special-purpose engines. I can't really speak to that issue.
There's an NSA effort (http://www.nitrd.gov/pubs/nsa/sta.pdf) to get clock rates up to the 50-100GHz level, and it's worth reading that paper to get an sense of what's required to increase clock rates from current levels. They're proposing huge supercomputers that use Rapid Single Flux Quantum devices, superconducting components, and run in liquid helium at 4 degrees Kelvin. Cooling gear struggles to keep the temperature down near absolute zero as the unit generates a kilowatt of heat inside the liquid helium tank. Half a million data streams at 50 Gbps each are required between the superconducting processors and room-temperature SRAM.
There may be some desperate military need for this. Its' not a technology that will take over the commercial data center world, let alone the desktop.
So we're stuck with a technology that gets smaller and cheaper, but not much faster.
This forces us in a few directions. We only have three ideas that really work: clusters, shared memory multiprocessors, and massively parallel graphics processors. All the other architectures that have been tried, from the nCube to the Connection Machine to the BBN Butterfly, have been duds. Even the Cell, the only exotic multiprocessor to go into mass production, has been a disappointment.
The future of personal computing may be cheaper and lighter portable devices, not more powerful ones. We should not assume that personal computers will continue to increase in power. We will see all that chip real estate used for memory, not more CPUs, to produce cheap single-chip computers. There will be more $100 laptops than 100-CPU laptops.
In larger machines, it's worth thinking about how to organize tightly coupled clusters better. We probably are going to get clusters on a chip for servers. At some point (probably around 20 CPUs, from SGI's experience) the scaling problems of shared memory multiprocessors get in the way, and it's better to have a separate address space. So we need to think about how to make inter-computer communication for such systems better. We need interprocess messaging that's supported by hardware to make it efficient without violating protection boundaries. We need operating system support for such messaging. Using XML or JSON over TCP/IP over simulated Ethernet to pass a message to another cluster on the same chip is not a good answer. Neither is shared memory, which leads to "one fails, they all fail" clustering. What programmers usually want in interprocess communication is a subroutine call; what the OS usually gives them is an I/O operation. Code is layered on top to work around that mismatch, inefficiently. We need to get past that. (Look at Infiniband, and at QNX messaging for successes.)
The other interesting area of work is graphics-type processors, which are turning into useful number crunching engines. I've been to talks at Stanford on compiling Matlab into graphics engine code. That may be the future of number crunching. It's definitely the past and future of game machines, which have almost always had most of their crunch power in the special-purpose engines. I can't really speak to that issue.
Topic | Posted By | Date |
---|---|---|
Multicore is unlikely to be the ideal answer. | Anders Jensen | 2008/02/14 04:24 AM |
And the links.. | Anders Jensen | 2008/02/14 04:25 AM |
Disappointing.. | Linus Torvalds | 2008/02/14 10:17 AM |
Disappointing.. | Mark Roulo | 2008/02/14 11:03 AM |
LOL (NT) | Linus Torvalds | 2008/02/14 05:43 PM |
Disappointing.. | David Patterson | 2008/02/15 11:53 AM |
Disappointing.. | Linus Torvalds | 2008/02/15 05:01 PM |
Disappointing.. | anon | 2008/02/15 08:54 PM |
Disappointing.. | JasonB | 2008/02/19 07:45 PM |
Disappointing.. | Ilya Lipovsky | 2008/02/22 06:27 PM |
Disappointing.. | Scott Bolt | 2008/03/16 11:36 AM |
Need for new programming languages | Vincent Diepeveen | 2008/02/19 06:18 AM |
Need for new programming languages | Pete Wilson | 2008/02/24 11:41 AM |
Disappointing.. | Zan | 2008/02/25 10:52 PM |
Disappointing.. | Robert Myers | 2008/02/19 09:47 PM |
Disappointing.. | Fred Bosick | 2008/02/22 06:38 PM |
Disappointing.. | Robert Myers | 2008/03/01 01:17 PM |
The limits of single CPU speed are here. | John Nagle | 2008/03/14 10:55 AM |
The limits of single CPU speed are here. | Howard Chu | 2008/03/15 01:02 AM |
The limits of single CPU speed are here. | slacker | 2008/03/15 08:08 AM |
The limits of single CPU speed are here. | Howard Chu | 2008/03/17 01:47 AM |
The limits of single CPU speed are here. | slacker | 2008/03/17 10:04 AM |
And the links.. | Howard Chu | 2008/02/14 12:58 PM |
I take some of that back | Howard Chu | 2008/02/14 01:55 PM |
And the links.. | Jesper Frimann | 2008/02/14 02:02 PM |
And the links.. | Ilya Lipovsky | 2008/02/15 02:24 PM |
And the links.. | iz | 2008/02/17 10:55 AM |
And the links.. | JasonB | 2008/02/17 07:09 PM |
And the links.. | Ilya Lipovsky | 2008/02/18 01:54 PM |
And the links.. | JasonB | 2008/02/18 10:34 PM |
And the links.. | Thiago Kurovski | 2008/02/19 07:01 PM |
And the links.. | iz | 2008/02/20 10:36 AM |
And the links.. | Ilya Lipovsky | 2008/02/20 03:37 PM |
And the links.. | JasonB | 2008/02/20 06:28 PM |
And the links.. | JasonB | 2008/02/17 06:47 PM |
And the links.. | Ilya Lipovsky | 2008/02/18 02:27 PM |
And the links.. | JasonB | 2008/02/18 10:00 PM |
And the links.. | JasonB | 2008/02/19 03:14 AM |
And the links.. | Ilya Lipovsky | 2008/02/20 04:29 PM |
And the links.. | JasonB | 2008/02/20 06:14 PM |
And the links.. | Ilya Lipovsky | 2008/02/21 11:07 AM |
And the links.. | Howard Chu | 2008/02/14 01:16 PM |
And the links.. | Jukka Larja | 2008/02/15 03:00 AM |
Berkeley View on Parallelism | David Kanter | 2008/02/15 11:41 AM |
Berkeley View on Parallelism | Howard Chu | 2008/02/15 12:49 PM |
Berkeley View on Parallelism | David Kanter | 2008/02/15 03:48 PM |
Berkeley View on Parallelism | Howard Chu | 2008/02/17 05:42 PM |
Berkeley View on Parallelism | nick | 2008/02/17 09:15 PM |
Berkeley View on Parallelism | Howard Chu | 2008/02/18 04:23 PM |
Berkeley View on Parallelism | nick | 2008/02/18 10:03 PM |
Berkeley View on Parallelism | Howard Chu | 2008/02/19 01:39 AM |
Berkeley View on Parallelism | rcf | 2008/02/19 12:44 PM |
Berkeley View on Parallelism | Howard Chu | 2008/02/19 03:25 PM |
Average programmers | anon | 2008/02/18 12:40 PM |
Berkeley View on Parallelism | JasonB | 2008/02/15 08:02 PM |
Berkeley View on Parallelism | JasonB | 2008/02/15 08:02 PM |
Berkeley View on Parallelism | Dean Kent | 2008/02/15 08:07 PM |
Berkeley View on Parallelism | Ray | 2008/02/20 03:20 PM |
Berkeley View on Parallelism | JasonB | 2008/02/20 06:11 PM |
Berkeley View on Parallelism | FritzR | 2008/02/24 03:08 PM |
rubyinline, etc. | nordsieck | 2008/02/22 03:38 PM |
rubyinline, etc. | JasonB | 2008/02/23 05:53 AM |
rubyinline, etc. | nordsieck | 2008/03/02 01:40 AM |
rubyinline, etc. | Michael S | 2008/03/02 02:49 AM |
rubyinline, etc. | Dean Kent | 2008/03/02 07:41 AM |
rubyinline, etc. | Michael S | 2008/03/02 08:19 AM |
rubyinline, etc. | Dean Kent | 2008/03/02 08:30 AM |
rubyinline, etc. | JasonB | 2008/03/02 05:26 PM |
rubyinline, etc. | JasonB | 2008/03/02 06:01 PM |
rubyinline, etc. | Anonymous | 2008/03/03 02:11 AM |
rubyinline, etc. | JasonB | 2008/03/03 09:40 AM |
rubyinline, etc. | Foo_ | 2008/03/09 09:59 AM |
rubyinline, etc. | JasonB | 2008/03/10 01:12 AM |
rubyinline, etc. | Gabriele Svelto | 2008/03/10 02:22 AM |
rubyinline, etc. | JasonB | 2008/03/10 04:35 AM |
C++ for beginners | Michael S | 2008/03/10 05:16 AM |
C++ for beginners | JasonB | 2008/03/10 06:35 AM |
C++ | Michael S | 2008/03/10 04:55 AM |
rubyinline, etc. | Linus Torvalds | 2008/03/03 11:35 AM |
rubyinline, etc. | Dean Kent | 2008/03/03 02:35 PM |
rubyinline, etc. | JasonB | 2008/03/03 03:57 PM |
rubyinline, etc. | Dean Kent | 2008/03/03 08:10 PM |
rubyinline, etc. | Michael S | 2008/03/04 01:53 AM |
rubyinline, etc. | Dean Kent | 2008/03/04 07:51 AM |
rubyinline, etc. | Michael S | 2008/03/04 08:29 AM |
rubyinline, etc. | Dean Kent | 2008/03/04 08:53 AM |
rubyinline, etc. | Michael S | 2008/03/04 11:20 AM |
rubyinline, etc. | Dean Kent | 2008/03/04 02:13 PM |
read it. thanks (NT) | Michael S | 2008/03/04 04:31 PM |
efficient HLL's | Patrik Hägglund | 2008/03/04 03:34 PM |
efficient HLL's | Wes Felter | 2008/03/04 09:33 PM |
efficient HLL's | Patrik Hägglund | 2008/03/05 01:23 AM |
efficient HLL's | Michael S | 2008/03/05 02:45 AM |
efficient HLL's | Wilco | 2008/03/05 05:34 PM |
efficient HLL's | Howard Chu | 2008/03/05 07:11 PM |
efficient HLL's | Wilco | 2008/03/06 02:27 PM |
efficient HLL's | anon | 2008/03/05 08:20 AM |
And the links.. | Groo | 2008/02/17 04:28 PM |
And the links.. | Vincent Diepeveen | 2008/02/18 02:33 AM |