By: anon (anon.delete@this.gmail.com), March 20, 2021 6:04 pm
Room: Moderated Discussions
Veedrac (ignore.delete@this.this.com) on March 20, 2021 11:27 am wrote:
> Moritz (better.delete@this.not.tell) on March 20, 2021 5:21 am wrote:
> > What if you could completely rethink the general processor concept?
>
> There are a thousand things a CPU does wrong. Memory is broken and wasteful. Protection mechanisms
> are archaic. The x86 encoding is garbage and distributing arch-specific binaries is sacrilege.
> Reorder buffers are irritatingly inefficient. SIMD instruction sets don't even try.
>
> But all of those are second order. You need to fix speculation. You cannot hope
> to have 30+ IPC unless you can speculate unboundedly, in a way that's immune to
> branch prediction, false memory hazards, loops, function calls, and so on.
>
> As far as I know, this means microthreads. So if I was to rewrite the world, I'd start by figuring
> out to build a core to handle a hundred plus microthreads with zero overhead, and work from
> there. I don't think there's anything fundamentally in the way of a core like this.
>
> (An additional thing I'd keep in mind is to make sure it's efficiently extensible to monolithic
> 3D silicon, since I'd like it to last, and monolithic 3D is a physical inevitability.)
https://en.wikipedia.org/wiki/Cray_MTA
for a historical example of a many threaded core.
Similarly, IBM Power chips often have 8-way SMT (for the same reason of trying to always have
work that you can make forward progress on). How would what you are suggesting differ from these?
> Moritz (better.delete@this.not.tell) on March 20, 2021 5:21 am wrote:
> > What if you could completely rethink the general processor concept?
>
> There are a thousand things a CPU does wrong. Memory is broken and wasteful. Protection mechanisms
> are archaic. The x86 encoding is garbage and distributing arch-specific binaries is sacrilege.
> Reorder buffers are irritatingly inefficient. SIMD instruction sets don't even try.
>
> But all of those are second order. You need to fix speculation. You cannot hope
> to have 30+ IPC unless you can speculate unboundedly, in a way that's immune to
> branch prediction, false memory hazards, loops, function calls, and so on.
>
> As far as I know, this means microthreads. So if I was to rewrite the world, I'd start by figuring
> out to build a core to handle a hundred plus microthreads with zero overhead, and work from
> there. I don't think there's anything fundamentally in the way of a core like this.
>
> (An additional thing I'd keep in mind is to make sure it's efficiently extensible to monolithic
> 3D silicon, since I'd like it to last, and monolithic 3D is a physical inevitability.)
https://en.wikipedia.org/wiki/Cray_MTA
for a historical example of a many threaded core.
Similarly, IBM Power chips often have 8-way SMT (for the same reason of trying to always have
work that you can make forward progress on). How would what you are suggesting differ from these?