By: David Kanter (dkanter.delete@this.realworldtech.com), June 22, 2008 8:54 pm
Room: Moderated Discussions
Peter (not@likely.com) on 6/22/08 wrote:
---------------------------
>>I sometimes wonder if we should not experiment more with ISAs that make dependencies
>>explicit rather than parallelism. I am aware of only a single old academic project
>>named "WM - Weird Machine". It encoded instructions as dependent pairs with implicit
>>producer/consumer dataflow. You could say it generalized things like 'fused multiply
>>add', or the complex 'load effective address' instructions.
>I suspect that given the cost of wires today a wide issue >processor with lots of
>execution resource would run into timing issues if it >tried to do a lot of forwarding.
What do you think of as a 'wide' processor? If you are thinking of something like a GPU, which has >100 parallel ALUs, then I'd totally agree. If you were thinking of something more like an 8 issue CPU, I'm not sure I'd agree.
I think if you wanted to do a large MPU with like, 100 functional units, I'd organize them in a tree or fat-tree like structure. Of course, an MPU with 100 functional units would be a bit of a waste unless you just do || work.
>Also, designing a micro-architecture that relies on >forwarding operands from multiple
>execution resources to all other execution resources is >not going to particularly
>power efficient as wire power is so high.
Absolutely, this is the classic shared nothing versus shared everything argument.
Shared everything is always easier to program.
Shared nothing is always higher performance/cheaper/lower power.
CPUs fall into the former category, GPUs into the latter. However, for the right sort of workloads the disadvantages of the GPU's shared nothing architecture are mitigated.
DK
---------------------------
>>I sometimes wonder if we should not experiment more with ISAs that make dependencies
>>explicit rather than parallelism. I am aware of only a single old academic project
>>named "WM - Weird Machine". It encoded instructions as dependent pairs with implicit
>>producer/consumer dataflow. You could say it generalized things like 'fused multiply
>>add', or the complex 'load effective address' instructions.
>I suspect that given the cost of wires today a wide issue >processor with lots of
>execution resource would run into timing issues if it >tried to do a lot of forwarding.
What do you think of as a 'wide' processor? If you are thinking of something like a GPU, which has >100 parallel ALUs, then I'd totally agree. If you were thinking of something more like an 8 issue CPU, I'm not sure I'd agree.
I think if you wanted to do a large MPU with like, 100 functional units, I'd organize them in a tree or fat-tree like structure. Of course, an MPU with 100 functional units would be a bit of a waste unless you just do || work.
>Also, designing a micro-architecture that relies on >forwarding operands from multiple
>execution resources to all other execution resources is >not going to particularly
>power efficient as wire power is so high.
Absolutely, this is the classic shared nothing versus shared everything argument.
Shared everything is always easier to program.
Shared nothing is always higher performance/cheaper/lower power.
CPUs fall into the former category, GPUs into the latter. However, for the right sort of workloads the disadvantages of the GPU's shared nothing architecture are mitigated.
DK
Topic | Posted By | Date |
---|---|---|
Intel AVX kills AMD SSE5 | Agner | 2008/06/17 08:14 AM |
Intel AVX kills AMD SSE5 | a reader | 2008/06/17 09:03 AM |
Bulldozer? | David Kanter | 2008/06/19 04:23 PM |
Bulldozer? | EduardoS | 2008/06/19 06:11 PM |
Bulldozer? | Max | 2008/06/19 06:16 PM |
Bulldozer? | Goose | 2008/06/21 02:23 AM |
Bulldozer? | David Kanter | 2008/06/21 07:37 AM |
Bulldozer? | someone | 2008/06/21 07:55 AM |
Bulldozer? | David Kanter | 2008/06/21 08:07 AM |
Bulldozer? | S. Rao | 2008/06/21 11:08 AM |
Regfiles | Peter | 2008/06/21 11:49 AM |
Bulldozer? | Linus Torvalds | 2008/06/21 12:23 PM |
Bulldozer? | S. Rao | 2008/06/21 04:50 PM |
unified physical register file nullifies x86 advantage | Michael S | 2008/06/22 12:24 AM |
unified physical register file nullifies x86 advantage | David Kanter | 2008/06/22 09:35 AM |
unified physical register file nullifies x86 advantage | hobold | 2008/06/22 01:03 PM |
Reg file vs. forwarding network | David Kanter | 2008/06/22 10:36 AM |
Reg file vs. forwarding network | hobold | 2008/06/22 12:39 PM |
Reg file vs. forwarding network | Peter | 2008/06/22 02:48 PM |
Reg file vs. forwarding network | David Kanter | 2008/06/22 08:54 PM |
Reg file vs. forwarding network | Peter | 2008/06/23 03:44 AM |
Reg file vs. forwarding network | savantu | 2008/06/23 04:41 AM |
Reg file vs. forwarding network | Peter | 2008/06/23 07:35 AM |
Reg file vs. forwarding network | Anders Jensen | 2008/06/23 11:05 AM |
Reg file vs. forwarding network | left nutz | 2008/06/27 07:31 AM |
Intel AVX kills AMD SSE5 | nobat | 2008/06/21 11:23 AM |
Intel AVX kills AMD SSE5 | Agner | 2008/06/21 10:01 PM |
So... | Dean Kent | 2008/06/22 07:35 AM |
SSE5 has a great chance to succeed. | mpx | 2008/06/22 12:25 AM |
SSE5 has a great chance to succeed. | Michael S | 2008/06/22 01:42 AM |
SSE5 has a great chance for fiasco | Agner | 2008/06/22 03:32 AM |
SSE5 has a great chance for fiasco | Ian Ameline | 2008/06/22 08:37 AM |
SSE5 has a great chance for fiasco | anonymous | 2008/06/22 09:02 AM |
SSE5 has a great chance for fiasco | hobold | 2008/06/22 12:59 PM |
SSE5 has a great chance for fiasco | Howard Chu | 2008/06/22 04:38 PM |
SSE5 has a great chance to succeed. | hobold | 2008/06/22 12:52 PM |
SSE5 has a great chance to succeed. | Michael S | 2008/06/22 01:46 PM |
SSE5 has a great chance to succeed. | Hannes | 2008/06/24 08:49 AM |
SSE5 has a great chance to succeed. | anonymous | 2008/06/24 10:46 AM |
SSE5 has a great chance to succeed. | Ian Ollmann | 2008/06/24 10:12 PM |