By: Heikki Kultala (heikk.i.kultal.a.delete@this.gmail.com), March 23, 2021 6:46 am
Room: Moderated Discussions
Veedrac (ignore.delete@this.this.com) on March 23, 2021 3:49 am wrote:
> Heikki Kultala (heikki.kultal.a.delete@this.gmail.com) on March 22, 2021 6:52 pm wrote:
> >
> > The implementation is always free to execute the microthreads sequentially (common case if all our
> > hardware microthreads are already in use, for example started by outer level function); programmer
> > can write his code/compiler can compile the code like he/it has infinite amount of microthreads
> > available. As the bundles execute atomically, different microthreads can still do things like incrementing
> > the same counter in memory, but as they are allowed to execute sequentially, they are not allowed
> > to wait data from one another microthread because that might cause a deadlock.
>
> Personally I expect this to be very limiting because it means you must spawn microthreads at the
> top level in an order that completely the respects dependencies of all the sub-threads.
These are not meant to be spawned at the top level at all. At top level, you spawn normal threads and execute those on totally different cores.
These are meant for a limited thing. They are only meant to be a slight bonus on top of a core that has excellent per-thread-performance by other means.
These microthreads are for only fully parallel very small things, meant to be used in very small granularity. These are not meant to be replacing normal threads on existing code but to be used for things where threads currently cannot be used because of the overheads. And typically inserted automatically by the compiler.
For example, if you have a (small) fully data parallel for loop, In addition to vectorizing it, you may also split it into 2-4 parts and launch a microthread for each part. Compared to normal threads, the benefit of these microthreads is much smaller overhead so that there is no need to analyze if the loop has big enough iteration cont for the threading to be beneficial, accelerate those cases when the iteration count is quite small.
Or you call some same pure function (which takes like 50 clock cycles) couple of times with different parameters, you can spawn a separate microthread for each function call.
But maybe some kind of unidirectional data flow could be allowed. Have to think how this would work when there are more than N+1 microthreads (where N is the limit of executing at the the same time)
> This is
> a lot to pay when ROBs already show us that hardware is amazing at handling register wakeups.
What is lot to pay?
> You can avoid deadlocks by requiring inter-thread dependencies to be a DAG.
Yes, a good idea, maybe it can be decided that data can flow to exactly one direction between the thread pair. However, I'm afraid that this would get more complicated with more than 2 threads.
> Heikki Kultala (heikki.kultal.a.delete@this.gmail.com) on March 22, 2021 6:52 pm wrote:
> >
> > The implementation is always free to execute the microthreads sequentially (common case if all our
> > hardware microthreads are already in use, for example started by outer level function); programmer
> > can write his code/compiler can compile the code like he/it has infinite amount of microthreads
> > available. As the bundles execute atomically, different microthreads can still do things like incrementing
> > the same counter in memory, but as they are allowed to execute sequentially, they are not allowed
> > to wait data from one another microthread because that might cause a deadlock.
>
> Personally I expect this to be very limiting because it means you must spawn microthreads at the
> top level in an order that completely the respects dependencies of all the sub-threads.
These are not meant to be spawned at the top level at all. At top level, you spawn normal threads and execute those on totally different cores.
These are meant for a limited thing. They are only meant to be a slight bonus on top of a core that has excellent per-thread-performance by other means.
These microthreads are for only fully parallel very small things, meant to be used in very small granularity. These are not meant to be replacing normal threads on existing code but to be used for things where threads currently cannot be used because of the overheads. And typically inserted automatically by the compiler.
For example, if you have a (small) fully data parallel for loop, In addition to vectorizing it, you may also split it into 2-4 parts and launch a microthread for each part. Compared to normal threads, the benefit of these microthreads is much smaller overhead so that there is no need to analyze if the loop has big enough iteration cont for the threading to be beneficial, accelerate those cases when the iteration count is quite small.
Or you call some same pure function (which takes like 50 clock cycles) couple of times with different parameters, you can spawn a separate microthread for each function call.
But maybe some kind of unidirectional data flow could be allowed. Have to think how this would work when there are more than N+1 microthreads (where N is the limit of executing at the the same time)
> This is
> a lot to pay when ROBs already show us that hardware is amazing at handling register wakeups.
What is lot to pay?
> You can avoid deadlocks by requiring inter-thread dependencies to be a DAG.
Yes, a good idea, maybe it can be decided that data can flow to exactly one direction between the thread pair. However, I'm afraid that this would get more complicated with more than 2 threads.