TOP3 blunders

By: ? (, September 27, 2009 1:08 am
Room: Moderated Discussions
Jouni Osmala ( on 9/26/09 wrote:
>>Now that we established that we do not need a GPU nor a video decoding hardware,
>There is multiple order of magnitude difference between special purpose hardware
>and software configurable in terms of perf/power generally. The big difference,
>is that instead of spending huge number of transistors selecting operations, and
>decoding instructions looking dependencies, you just have small units that just do the work.
>Byte addition costs~200 transistors 32bit addition ~1000 transistors. and & or are 4 transistors per bit.
>Shift by constant known before are almost free.
>For multiplier it approximately takes an adder sized of first operand per bit of second operand.
>And then you compare those costs the tens of millions of logic transistors in modern
>microprocessors. Which is programmable method of doing same thing.Tens of millions
>of active transistors working to execute small sequence of instructions that eventually
>are just couple of thousands transistors if done in hardware without all the baggage
>of programmability.With exception of pipelined multiplier & divider execution units are free.
>What costs is the generic routing networks and instruction decoding and instruction
>selection and exception handling and TLB handling, and branch prediction and ...
>Basicly everything that makes it programmable, and to make it easier for programmers to run their programs fast.
>Now you want to add a second order programmability over those millions of transistors
>already required to do the work of making first order programmability, by turning
>their special purpose function to a generic function.

I generally agree with your assessment. But I am more interested in how to make it work, and not in how to make it fail.

>>(EAX = 1st number)
>>(EBX = 2nd number)
>>TEST ECX,(1<<31)
>>JZ failed
>>JO failed
>>OR EAX,(1<<31)
>>JMP done
>>(EAX = sum-or-zero)
>>If I implemented it correctly, it ideally takes some 8-9 cycles to execute this
>>on an OoO x86 processor.
>The code you put should take 4 cycles on modern x86 processor if the failed condition
>doesn't happen and branchpredictor would work. Even code like that would get 4 cycles
>from OoO logic, you could get similar with inorder RISC with proper scheduling but not from inorder x86.
>And 15 cycles more if branch predictor was wrong.

I put the code into a loop and got an IPC of 1.8 on my notebook CPU. This suggests it takes (11/1.8) = ~6 cycles to execute.

>Didn't check if your code worked, just did scheduling for it just like CPU does
>with its OoO logic or compiler does it with inorder machines.
>>But I think your question is of a minor importance. The major question is: How
>>to implement such a generic mechanism so that it is scalable?
>The generic mechanism doesn't really work. The OoO scheduling becomes impossible
>in your scheme. The extra transistors in critical paths, take power and either slowdown
>the clockspeed or makes your EVERY instruction have multicycle latency.
>The ancient method of providing ISA programmability to programmers was that they
>could turn couple of instruction to multiple instructions that would be issued over
>multiple cycles, but those instructions where normal instructions and nothing fancy.
>But JMP instruction +instruction cache does the same function but in a more generic
>way. And you don't need the extra costs in decoding phase with use of those.

... I agree with most of what you wrote. The only place where I do not agree is that it can be implemented and should be implemented.

Your argument is, basically, that current x86 CPUs exhibit the right balance between complexity, usefulness, power and performance.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
TOP3 blundersanon2009/09/25 07:38 PM
  TOP3 blundersrwessel2009/09/25 07:58 PM
  TOP3 blundersPotatoswatter2009/09/26 02:12 AM
    TOP3 blundersJouni Osmala2009/09/26 08:47 AM
    TOP3 blundersWilco2009/09/26 10:51 AM
      TOP3 blundersPotatoswatter2009/09/26 01:04 PM
        TOP3 blundersRagingDragon2009/09/28 02:58 PM
  TOP3 blunders?2009/09/26 03:44 AM
    TOP3 blundersanon2009/09/26 06:55 AM
      TOP3 blunders?2009/09/26 10:49 AM
        TOP3 blundersPotatoswatter2009/09/26 01:38 PM
        TOP3 blundersJouni Osmala2009/09/26 09:52 PM
          TOP3 blunders?2009/09/27 01:08 AM
            implementationAM2009/09/27 01:31 AM
          the generic mechanismAM2009/09/27 01:15 AM
            the generic mechanism?2009/09/27 02:30 AM
          TOP3 blunderssomeone2009/09/27 07:49 AM
            TOP3 blunders?2009/09/27 08:33 AM
              TOP3 blundersBlistering blue barnacles2009/09/28 08:44 PM
              TOP3 blundersslacker2009/09/29 07:28 AM
                (12nJ/cycle)*3GHz = 36W (NT)Michael S2009/09/29 08:31 AM
                  That's what I get for posting so early. (NT)slacker2009/09/29 06:17 PM
          TOP3 blunderskoby m.2009/09/29 07:26 AM
            TOP3 blundersJouni Osmala2009/09/29 11:05 PM
              TOP3 blunders?2009/09/30 02:26 AM
    TOP3 blundersPotatoswatter2009/09/26 07:11 AM
      TOP3 blunderssomeone2009/09/26 09:26 AM
    ISA extensions - tensilica?David Kanter2009/09/26 10:17 AM
      ISA extensions - StretchWes Felter2009/09/26 11:24 AM
        ISA extensions - Stretch?2009/09/26 01:14 PM
          ISA extensions - StretchGabriele Svelto2009/09/28 03:14 AM
  TOP3 blundersanonymous2009/09/26 06:23 AM
  TOP3 smart movesanonib2009/09/26 07:11 PM
    TOP3 smart movesDavid W. Hess2009/09/26 07:55 PM
      TOP3 smart movesRichard Cownie2009/09/27 01:57 PM
        TOP3 smart movesRagingDragon2009/09/28 03:46 PM
    TOP3 smart movesa reader2009/09/27 08:44 AM
    TOP3 smart movesRichard Cownie2009/09/27 12:12 PM
      TOP2 probably illegal smart movesRichard Cownie2009/09/27 01:43 PM
        TOP2 probably illegal smart movesnn2009/09/27 07:27 PM
        TOP2 probably illegal smart movesDavid Kanter2009/09/29 01:48 AM
          TOP2 probably illegal smart movesRichard Cownie2009/09/29 10:31 AM
            TOP2 probably illegal smart movesMichael S2009/09/29 11:04 AM
              TOP2 probably illegal smart movesDavid Kanter2009/09/29 01:35 PM
              TOP2 probably illegal smart movesRichard Cownie2009/09/29 04:28 PM
              TOP2 probably illegal smart movesJouni Osmala2009/09/29 11:08 PM
              Link to good analysis on the matter.Jouni Osmala2009/09/29 11:22 PM
                Link to good analysis on the matter.Richard Cownie2009/09/30 09:59 AM
  TOP3 blunder of todayMichael S2009/09/29 04:31 AM
Reply to this Topic
Body: No Text
How do you spell green?