By: Veedrac (ignore.delete@this.this.com), March 24, 2021 7:11 am
Room: Moderated Discussions
Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on March 24, 2021 1:13 am wrote:
>
> I have a small problem if you are trying to start microthreads of less than a hundred or so cycles:
> If a processor with 200 instructions in flight will pause with a single thread (probably waiting
> for inputs), it is likely both microthreads will also pause for the same reason. Then the best/simplest
> way is probably get more hardware so that the single thread do not pause.
Sure, that's why you want a great many such (trees of) microthreads spread over arbitrarily large distances in program space. If you only have a small number of contiguous microthreads then you've really just got a weird reorder buffer.
> If the microthreads do something completely different (like one zeroing next malloc()
> block, there is probably a lot of allocations in today's software), both microthreads
> have less chances to require the same processor hardware and be paused at the same time.
Unlike memory stalls, which are fundamentally hard to avoid, contention on execution (or decode, dispatch, etc.) is mostly a symptom of how imperfectly speculation scales. If getting another 30-40% IPC means you can only fit half the number of cores on a chip, it makes sense why core counts keep rising. And there's no point scaling out execution resources way out if it's so hard to find instructions to execute. But if you can speculate unboundedly with a low overall loss to throughput, and can scale execution resources in a modular way, then it makes sense to build big CPUs that can support massive peak throughputs.
>
> I have a small problem if you are trying to start microthreads of less than a hundred or so cycles:
> If a processor with 200 instructions in flight will pause with a single thread (probably waiting
> for inputs), it is likely both microthreads will also pause for the same reason. Then the best/simplest
> way is probably get more hardware so that the single thread do not pause.
Sure, that's why you want a great many such (trees of) microthreads spread over arbitrarily large distances in program space. If you only have a small number of contiguous microthreads then you've really just got a weird reorder buffer.
> If the microthreads do something completely different (like one zeroing next malloc()
> block, there is probably a lot of allocations in today's software), both microthreads
> have less chances to require the same processor hardware and be paused at the same time.
Unlike memory stalls, which are fundamentally hard to avoid, contention on execution (or decode, dispatch, etc.) is mostly a symptom of how imperfectly speculation scales. If getting another 30-40% IPC means you can only fit half the number of cores on a chip, it makes sense why core counts keep rising. And there's no point scaling out execution resources way out if it's so hard to find instructions to execute. But if you can speculate unboundedly with a low overall loss to throughput, and can scale execution resources in a modular way, then it makes sense to build big CPUs that can support massive peak throughputs.