By: anon (anon.delete@this.anon.com), April 21, 2015 10:27 pm
Room: Moderated Discussions
Ivan Godard (ivan.delete@this.millcomputing.com) on April 20, 2015 5:08 pm wrote:
> > Its performance will also be heavily dependent
> > on how effective is the load/store machinery at handling multiple operations in parallel as well as
> > the ability of the compiler to generate static MLP and feed it with enough operations before the execution
> > core is forced to stall. Anyway I wouldn't want to iterate over linked lists with it.
>
> MLP is an issue for any architecture, especially when there are chained memory dependencies as in a linked
> list.
Actually that is the easiest case for any microarchitecture. No reordering or speculation or prefetching required. Issue the load when you have the address, then wait for the next address.
> The Mill will be slow on a list, but no slower than anybody else - we all need to know the next
> address before we can fetch from it, so there's no MLP for anyone.
> In structures where there is the possibility
> of MLP (say iterating over an array) then the Mill can keep the memory pipes full, although only part
> of the mechanism to do that has been publicly described; the rest awaits filing patents.
Iterating over an array is the next easiest case.
The difficult case is to extract MLP from statically unpredictable, dynamically predictable code. Bonus points for doing it across function calls and modules. You have said that it's something you can do with these undisclosed mechanisms, but I'd be surprised if an inorder machine (without run-ahead or other out-of-orderish techniques) can do it well without recompilation.
> > Its performance will also be heavily dependent
> > on how effective is the load/store machinery at handling multiple operations in parallel as well as
> > the ability of the compiler to generate static MLP and feed it with enough operations before the execution
> > core is forced to stall. Anyway I wouldn't want to iterate over linked lists with it.
>
> MLP is an issue for any architecture, especially when there are chained memory dependencies as in a linked
> list.
Actually that is the easiest case for any microarchitecture. No reordering or speculation or prefetching required. Issue the load when you have the address, then wait for the next address.
> The Mill will be slow on a list, but no slower than anybody else - we all need to know the next
> address before we can fetch from it, so there's no MLP for anyone.
> In structures where there is the possibility
> of MLP (say iterating over an array) then the Mill can keep the memory pipes full, although only part
> of the mechanism to do that has been publicly described; the rest awaits filing patents.
Iterating over an array is the next easiest case.
The difficult case is to extract MLP from statically unpredictable, dynamically predictable code. Bonus points for doing it across function calls and modules. You have said that it's something you can do with these undisclosed mechanisms, but I'd be surprised if an inorder machine (without run-ahead or other out-of-orderish techniques) can do it well without recompilation.