By: rwessel (rwessel.delete@this.yahoo.com), May 17, 2021 6:55 pm
Room: Moderated Discussions
Little Horn (sink.delete@this.example.net) on May 17, 2021 5:03 pm wrote:
> Thoughts?
I've been a proponent of that sort of thing for years. They've missed a couple of items, though. Thread states should probably get their own cache, and probably two or three levels. L1 corresponding to actual hardware states in the core (basically what SMT does now), and then a couple of levels to back those up before spilling to memory. The performance requires are sufficiently different that trying to keep this in the data caches is likely foolish, the data caches are under enough pressure as it is.
Hardware scheduling is a must, as they indicate, but that incudes moving threads between cores, and sockets.
Inter-thread synchronization and signaling need more than x86-like monitor/mwait. Hardware probably needs to support several forms, but probably some form of message queue (those would also fit more naturally with "interrupt" handlers), and some good primitives for implementing the standard synchronization objects, as well as something RPC-like (especially for system calls). These need to be fast in both the single and multiple receiver cases, and needs to happen across sockets.
Microthread support is necessary - single "thread" performance remains a critical item. Either explicit creation, or via messages to pools of pre-created threads (but that has to be fast).
> Thoughts?
I've been a proponent of that sort of thing for years. They've missed a couple of items, though. Thread states should probably get their own cache, and probably two or three levels. L1 corresponding to actual hardware states in the core (basically what SMT does now), and then a couple of levels to back those up before spilling to memory. The performance requires are sufficiently different that trying to keep this in the data caches is likely foolish, the data caches are under enough pressure as it is.
Hardware scheduling is a must, as they indicate, but that incudes moving threads between cores, and sockets.
Inter-thread synchronization and signaling need more than x86-like monitor/mwait. Hardware probably needs to support several forms, but probably some form of message queue (those would also fit more naturally with "interrupt" handlers), and some good primitives for implementing the standard synchronization objects, as well as something RPC-like (especially for system calls). These need to be fast in both the single and multiple receiver cases, and needs to happen across sockets.
Microthread support is necessary - single "thread" performance remains a critical item. Either explicit creation, or via messages to pools of pre-created threads (but that has to be fast).