A Case Against That Other Paper

By: Brendan (btrotter.delete@this.gmail.com), May 18, 2021 12:37 pm
Room: Moderated Discussions
A Case Against That Other Paper

By Brendan and a Rubber Duck


Introduction

CPUs with lots of hardware threads suffer severe problems for all forms of caching (including branch prediction and TLBs) while also destroying any hope of effective hardware prefetching (for instruction, data and TLB) as there's no hope of any kind of locality between disparate threads; causing cache thrashing, and exacerbating the "1000+ channel memory controller doesn't exist" problem. With significantly fewer hardware threads (no more than 4) we believe a CPU can actually work properly, leading to an order of magnitude better performance for embarrassingly parallel workloads and 2 or more orders of magnitude better performance for anything subject to Amdahl's law.

Currently, all of the CPUs that provide many hardware threads get fundamental parts of task switching wrong. They are too inflexible to support the tracking of statistics, including basic "CPU time used by task" but extending all the way up to advanced performance monitoring and profiling; and any research into more complex schemes (e.g. changing CPU speed to reflect task priority, to build up thermal headroom while unimportant work is being done and increase performance while important work is being done) are similarly ruined by the inherent inflexibility. Further, currently none of the "many hardware thread" CPUs support the use of task priorities to ensure that CPU's resources aren't wasted on unimportant tasks at the expense of the performance for anything that actually matters (costing a significant amount of performance where it matters). We propose that shifting to a purely software based task switching model will enable a significant improvement in flexibility, resulting in a major reduction in how much an OS has to suck.

For micro-kernels (and everything else that uses any kind of inter-process communication); with a large number of software-managed hardware threads, when an application wishes to communicate with a service (such as the file system or the network stack), the entire system falls apart. The first problem is that the overly hyped monitor/mwait is subject to false wake-up (from speculative execution, aliasing in caches caused by CPU not being able to track 10000+ separate monitors and check them all on every single write from any core, and other phenomena), causing CPU's resources to be wasted on phantom work. The second problem is a hornet's nest of security and denial of service problems (e.g. malicious tasks constantly waking up every service they can through a combination of false wake-up and brute force; plus spamming tasks that they have legitimate access to without worrying about OS being able to throttle send rate) that since the introduction of "many hardware threads" have remain unsolved. We believe that fewer/faster hardware threads with software based security built into communication and software task switching will allow the flaws in the "many hardware threads" to be solved at no extra cost.

Further, to solve ongoing spectre security disasters, we believe that fewer hardware threads will enable new scheduling techniques whereby threads are only allowed to share a core if there's no risk from data leaks between the threads (e.g. sharing a core only permitted when all threads belong to the same process, or all threads previously declared "no confidential data use", or all threads are marked as "trusted"; or all threads except one are "trusted with no confidential data").


Conclusion

Context switches and their associated overheads and limitations should be significantly reduced through the adoption of a "new" software threading model that gives software control
over a large number of software threads (that are not restricted to "per core"). While there are software and hardware research questions to address, we argue that the proposed eradication of the "many hardware threads" model will lead to significant simplification, performance gains and security improvements, for both systems and application code.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
A Case Against (Most) Context SwitchesLittle Horn2021/05/17 05:03 PM
  A Case Against (Most) Context Switchesrwessel2021/05/17 06:55 PM
  A Case Against (Most) Context SwitchesFoo_2021/05/18 01:58 AM
    A Case Against (Most) Context SwitchesDoug S2021/05/18 08:45 AM
      A Case Against (Most) Context SwitchesKonrad Schwarz2021/05/19 07:35 AM
  A Case Against (Most) Context SwitchesEtienne Lorrain2021/05/18 03:11 AM
  A Case Against (Most) Context SwitchesAndrey2021/05/18 06:58 AM
  A Case Against (Most) Context Switchesgallier22021/05/18 08:41 AM
  A Case Against (Most) Context Switches---2021/05/18 09:00 AM
  A Case Against That Other PaperBrendan2021/05/18 12:37 PM
    A Case Against That Other PaperMark Roulo2021/05/18 03:32 PM
      A Case Against That Other PaperBrendan2021/05/18 11:05 PM
        A Case Against That Other PaperMark Roulo2021/05/19 01:09 PM
  A Case Against (Most) Context SwitchesRomain Dolbeau2021/05/19 04:05 AM
    A Case Against (Most) Context SwitchesBjörn Ragnar Björnsson2021/05/19 01:13 PM
      A Case Against ... authors show zero awareness of Cray-MTABjörn Ragnar Björnsson2021/05/19 06:18 PM
    Cray MTA avoided cachesPaul A. Clayton2021/05/20 06:36 AM
      Cray MTA avoided cachesdmcq2021/05/20 10:09 AM
        Cray MTA avoided cachesRayla2021/05/20 10:28 AM
      A LONG response to the paperPaul A. Clayton2021/05/22 06:15 AM
        A LONG response to the paperAdrian2021/05/22 09:18 AM
          Thank you for the note of appreciationPaul A. Clayton2021/05/24 05:06 AM
  A Case Against (Most) Context Switchesdmcq2021/05/19 01:47 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊