By: anon (anon.delete@this.anon.com), November 22, 2010 2:24 am
Room: Moderated Discussions
ajensen (@.) on 11/22/10 wrote:
---------------------------
>Ok so there has been a lot of discussion about whether Poulson will be/is/should
>have been a OoOE design. I'm sure everyone here agrees that the current McKinley
>through Tukwila pipeline has its issues (or designed on the basis of the epic scrolls
>of the dark lord him self if you will).
>
>So if IPF would go in the direction of a more dynamic design which is more likely?:
>
>1) run-ahead
>2) advanced run-ahead (like the multipass pipeline)
IPF gets pretty good IPC without OOOE, provided they can keep IPC up with a couple more cycles of L1 latency, it's biggest weakness is cache misses. If IBM thought it was worth doing with POWER6, surely it must be seriously considered for IPF?
>3) Scoreboarding?? Perhaps viable is less state needs to be tracked
>4) Tomasulo engine "distributed scoreboarding"
>5) Hiding cache misses within one thread is so 2010, fine grained MT or very dynamic SoEMT will do the trick.
Massively MT to hide poor single threaded performance is soo Niagara/GPU :)
Single threaded performance actually becomes more important as it gets harder to beat Amdahl's law with ever increasing numbers of threads. We're talking about thousands of threads in high end servers...
---------------------------
>Ok so there has been a lot of discussion about whether Poulson will be/is/should
>have been a OoOE design. I'm sure everyone here agrees that the current McKinley
>through Tukwila pipeline has its issues (or designed on the basis of the epic scrolls
>of the dark lord him self if you will).
>
>So if IPF would go in the direction of a more dynamic design which is more likely?:
>
>1) run-ahead
>2) advanced run-ahead (like the multipass pipeline)
IPF gets pretty good IPC without OOOE, provided they can keep IPC up with a couple more cycles of L1 latency, it's biggest weakness is cache misses. If IBM thought it was worth doing with POWER6, surely it must be seriously considered for IPF?
>3) Scoreboarding?? Perhaps viable is less state needs to be tracked
>4) Tomasulo engine "distributed scoreboarding"
>5) Hiding cache misses within one thread is so 2010, fine grained MT or very dynamic SoEMT will do the trick.
Massively MT to hide poor single threaded performance is soo Niagara/GPU :)
Single threaded performance actually becomes more important as it gets harder to beat Amdahl's law with ever increasing numbers of threads. We're talking about thousands of threads in high end servers...