By: Mark Roulo (nothanks.delete@this.xxx.com), May 18, 2021 3:32 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on May 18, 2021 12:37 pm wrote:
> A Case Against That Other Paper
>
> By Brendan and a Rubber Duck
>
>
> Introduction
>
> CPUs with lots of hardware threads suffer severe problems for all forms of caching (including branch
> prediction and TLBs) while also destroying any hope of effective hardware prefetching (for instruction,
> data and TLB) as there's no hope of any kind of locality between disparate threads; causing cache
> thrashing, and exacerbating the "1000+ channel memory controller doesn't exist" problem. With significantly
> fewer hardware threads (no more than 4) we believe a CPU can actually work properly, leading to
> an order of magnitude better performance for embarrassingly parallel workloads and 2 or more orders
> of magnitude better performance for anything subject to Amdahl's law.
>
> Currently, all of the CPUs that provide many hardware threads get fundamental parts of task switching wrong.
The paper proposes that the SOFTWARE control the task switching, which is quite different from today's SMT/Hyperthreading.
The idea is that task switching will be faster if each task stores its register state in H/W rather than in memory (cache or DRAM). The OS, presumably, would still be responsible for selecting the thread(s) to run at any one time, but the cost of a task switch would be low.
I'm envisioning something like SPARCs register windows, just used for a different purpose and MUCH larger. Or like the Z80 banked registers, just much more so.
> A Case Against That Other Paper
>
> By Brendan and a Rubber Duck
>
>
> Introduction
>
> CPUs with lots of hardware threads suffer severe problems for all forms of caching (including branch
> prediction and TLBs) while also destroying any hope of effective hardware prefetching (for instruction,
> data and TLB) as there's no hope of any kind of locality between disparate threads; causing cache
> thrashing, and exacerbating the "1000+ channel memory controller doesn't exist" problem. With significantly
> fewer hardware threads (no more than 4) we believe a CPU can actually work properly, leading to
> an order of magnitude better performance for embarrassingly parallel workloads and 2 or more orders
> of magnitude better performance for anything subject to Amdahl's law.
>
> Currently, all of the CPUs that provide many hardware threads get fundamental parts of task switching wrong.
The paper proposes that the SOFTWARE control the task switching, which is quite different from today's SMT/Hyperthreading.
The idea is that task switching will be faster if each task stores its register state in H/W rather than in memory (cache or DRAM). The OS, presumably, would still be responsible for selecting the thread(s) to run at any one time, but the cost of a task switch would be low.
I'm envisioning something like SPARCs register windows, just used for a different purpose and MUCH larger. Or like the Z80 banked registers, just much more so.