By: Mark Roulo (nothanks.delete@this.xxx.com), March 21, 2021 9:47 am
Room: Moderated Discussions
Jukka Larja (roskakori2006.delete@this.gmail.com) on March 21, 2021 12:26 am wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on March 20, 2021 5:49 pm wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on March 20, 2021 1:20 pm wrote:
> > > Hugo Décharnes (hdecharn.delete@this.outlook.fr) on March 20, 2021 7:34 am wrote:
> > > > Having programs delivered in annotated, intermediate representation (IR) would be great.
> > >
> > > In theory.
> > >
> > > Not necessarily in practice.
> > >
> > > Not only do you often have a big optimization problem (look at everybody who has tried
> > > it: they pretty much always ended up having a "native mode" fallback for games), but
> > > you have a very nasty testing problem, because often the IR is designed by compiler people,
> > > and those people are more than happy to talk about "undefined behavior" etc.
> > >
> > > So the end result will generally be rather under-defined, and then different
> > > hardware, and different recompilers will result in different behavior.
> > >
> > > That happens even with real hardware, but it at least tends to happen less. Partly because
> > > the HW people have actually mostly learnt from their mistakes, while in many areas the compiler
> > > people have been even more open to "undefined behavior" in the name of performance.
> > >
> > > Java (and wasm) is actually doing pretty damn well. It's unusually well-specified
> > > for an IR (despite issues), it works, it's used, and it's fine.
> >
> > Were I trying to design some sort of distributable/executable IR-ish format
> > I would focus on being able to specify "don't care" about sequencing.
> >
> > One weakness with the Java Virtual Machine is that loops MUST have an order, even if the iterations are
> > independent. In theory a good OoO engine can/might be able to untangle all the artificial dependencies.
> > I'd like the ability to just not put in artificial/wrong dependencies that must then be removed.
> >
> > Maybe there aren't enough of these for "normal" code to be worth while.
>
> We have a system in place to write:
>
> for (...) { doSomethingInBackgroundtask(...); } waitforBackgroundTasks();
>
> A coder suggested we should have this. It was easy enough to add for what we had before, so I added it. Two
> years later I came across something that could use it, clapped my hands in excitement, and noticed there was
> a slight problem with the syntax and the code didn't actually compile. Apparently during that time no-one
> else had tried to use it (the problem with the syntax was minor. Took me less than a minute to fix).
>
> So yeah, I think the problem is it's not actually a very common pattern. It tends to require lot of
> work to make sure things really can run in parallel. By the time that is done, syntax for actually
> spawning the individual tasks (functions) and waiting for them to finish is rather minor point.
I wasn't clear.
I don't want the ability to run threads/tasks in parallel added to the ISA/IR. I know how to do that :-)
I want the ability to run zero-overhead independent threads in parallel added to the ISA. Think of something more like CUDA/GPU blocks. Each block is supposed to be independent and the underlying hardware is permitted to run as many blocks in parallel as it has resources for. In the limit (though I don't think this has ever happened) the blocks would run sequentially. More expensive GPUs can run over 100 blocks in parallel. The overhead to launch these is small (zero?).
The problem with OS threads/tasks is that they are so heavyweight that many cases where you COULD logically parallelize code (or, really, just NOT serialize it) aren't worth doing because the overhead kills all the gains.
There is a chicken-and-egg p problem here, of course.
*) The languages need to support this, and
*) The hardware needs to support this too (or no one will use the feature)
For lots of code I see this would be quite nice. But I suspect my code may be pretty niche and I don't know how often this would be useful in more general purpose code.
> Mark Roulo (nothanks.delete@this.xxx.com) on March 20, 2021 5:49 pm wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on March 20, 2021 1:20 pm wrote:
> > > Hugo Décharnes (hdecharn.delete@this.outlook.fr) on March 20, 2021 7:34 am wrote:
> > > > Having programs delivered in annotated, intermediate representation (IR) would be great.
> > >
> > > In theory.
> > >
> > > Not necessarily in practice.
> > >
> > > Not only do you often have a big optimization problem (look at everybody who has tried
> > > it: they pretty much always ended up having a "native mode" fallback for games), but
> > > you have a very nasty testing problem, because often the IR is designed by compiler people,
> > > and those people are more than happy to talk about "undefined behavior" etc.
> > >
> > > So the end result will generally be rather under-defined, and then different
> > > hardware, and different recompilers will result in different behavior.
> > >
> > > That happens even with real hardware, but it at least tends to happen less. Partly because
> > > the HW people have actually mostly learnt from their mistakes, while in many areas the compiler
> > > people have been even more open to "undefined behavior" in the name of performance.
> > >
> > > Java (and wasm) is actually doing pretty damn well. It's unusually well-specified
> > > for an IR (despite issues), it works, it's used, and it's fine.
> >
> > Were I trying to design some sort of distributable/executable IR-ish format
> > I would focus on being able to specify "don't care" about sequencing.
> >
> > One weakness with the Java Virtual Machine is that loops MUST have an order, even if the iterations are
> > independent. In theory a good OoO engine can/might be able to untangle all the artificial dependencies.
> > I'd like the ability to just not put in artificial/wrong dependencies that must then be removed.
> >
> > Maybe there aren't enough of these for "normal" code to be worth while.
>
> We have a system in place to write:
>
> for (...) { doSomethingInBackgroundtask(...); } waitforBackgroundTasks();
>
> A coder suggested we should have this. It was easy enough to add for what we had before, so I added it. Two
> years later I came across something that could use it, clapped my hands in excitement, and noticed there was
> a slight problem with the syntax and the code didn't actually compile. Apparently during that time no-one
> else had tried to use it (the problem with the syntax was minor. Took me less than a minute to fix).
>
> So yeah, I think the problem is it's not actually a very common pattern. It tends to require lot of
> work to make sure things really can run in parallel. By the time that is done, syntax for actually
> spawning the individual tasks (functions) and waiting for them to finish is rather minor point.
I wasn't clear.
I don't want the ability to run threads/tasks in parallel added to the ISA/IR. I know how to do that :-)
I want the ability to run zero-overhead independent threads in parallel added to the ISA. Think of something more like CUDA/GPU blocks. Each block is supposed to be independent and the underlying hardware is permitted to run as many blocks in parallel as it has resources for. In the limit (though I don't think this has ever happened) the blocks would run sequentially. More expensive GPUs can run over 100 blocks in parallel. The overhead to launch these is small (zero?).
The problem with OS threads/tasks is that they are so heavyweight that many cases where you COULD logically parallelize code (or, really, just NOT serialize it) aren't worth doing because the overhead kills all the gains.
There is a chicken-and-egg p problem here, of course.
*) The languages need to support this, and
*) The hardware needs to support this too (or no one will use the feature)
For lots of code I see this would be quite nice. But I suspect my code may be pretty niche and I don't know how often this would be useful in more general purpose code.