By: anon (spam.delete.delete@this.this.spam.com), May 8, 2017 3:15 am
Room: Moderated Discussions
Brett (ggtgp.delete@this.yahoo.com) on May 8, 2017 2:38 am wrote:
> anon (spam.delete.delete@this.this.spam.com) on May 8, 2017 1:39 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on May 7, 2017 8:57 pm wrote:
> > > anon (spam.delete.delete@this.this.spam.com) on May 7, 2017 5:47 pm wrote:
> > > > Brett (ggtgp.delete@this.yahoo.com) on May 7, 2017 1:56 pm wrote:
> > > > > anon (spam.delete.delete@this.this.spam.com) on May 6, 2017 4:41 pm wrote:
> > > > > > Brett (ggtgp.delete@this.yahoo.com) on May 6, 2017 3:16 pm wrote:
> > > > > > > anon (spam.delete.delete@this.this.spam.com) on April 21, 2017 1:19 pm wrote:
> > > > > > > > NoSpammer (no.delete@this.spam.com) on April 21, 2017 12:40 pm wrote:
> > > > > > > > > anon (spam.delete.delete@this.this.spam.com) on April 21, 2017 7:55 am wrote:
> > > > > > > > > > Like I said, if they have a plan on how to deal with the things
> > > > > > > > > > that usually kill VLIW it can work, if they don't it won't.
> > > > > > > > >
> > > > > > > > > It doesn't matter if they have a plan. Where's the solution?
> > > > > > > >
> > > > > > > > Do you think a massively simplified version will put up numbers that'll impress anyone?
> > > > > > > > Yes, if they've never verified that their concept works they'll have a problem.
> > > > > > > >
> > > > > > > > > The alternative of course is you pretend all is great, go looking for naive investors,
> > > > > > > > > and while your are at it you go to RWT to try to make some hype and have several anonymous
> > > > > > > > > people talking favorably of it to maybe convince some non believers.
> > > > > > > > >
> > > > > > > > > If there's no compiler and no emulator in any form of even
> > > > > > > > > something architecturally similar to what you are
> > > > > > > > > selling after so many years, it's not difficult to conclude
> > > > > > > > > the game is about striking luck with some patents.
> > > > > > > >
> > > > > > > > http://millcomputing.com/topic/simulation/#post-1547
> > > > > > > > They have something, which they have probably shown investors.
> > > > > > > > Doesn't mean they have to show everyone their cards yet.
> > > > > > > >
> > > > > > > > So they're trying to make money by patenting something that doesn't even work?
> > > > > > > >
> > > > > > > > This seems like the highest effort, lowest reward (none) scam if you are right.
> > > > > > >
> > > > > > > Everyone seems to assume that the Mill is in-order, as do I, but
> > > > > > > how hard is it to scale this up to an out of order design?
> > > > > >
> > > > > > Basically impossible.
> > > > >
> > > > > Answers below.
> > > > >
> > > > > > > First you have to save the belt, but I assume each ALU has its own port to its own part of the
> > > > > > > scratchpad.
> > > > > > That would waste a lot of space both in terms of unused
> > > > > > capacity and implementation. Also how would you write
> > > > > > into the scratchpad from anything that's not an ALU. Even just the ALUs would mean 8 ports on a Gold.
> > > > >
> > > > > Yes you would waste some space with eight scratchpads. But the front end (and compiler) would
> > > > > load balance so that one ALU would not get too overloaded and fill its scratchpad.
> > > > >
> > > >
> > > > So classic hand wave "the compiler will take care of it".
> > >
> > > The compiler would not do a great job outside of loops, but one
> > > would not rely on the compiler for more than indirect hints.
> > >
> > > > > > > The namers would be ALU based, with only occasional accesses to other ALU's.
> > > > > > So you want OoOE but with static scheduling? Or you want to do the full OoO dependency analysis
> > > > > > and "optimal" scheduling so it's identical to current OoOE machines except wider?
> > > > >
> > > > > For this thought exercise I am assuming eight belts, running groups of opcodes, and only
> > > > > needing to communicating inputs, and net outputs to the scratchpad, not belt positions.
> > > >
> > > > So you're just using the scratchpad as a terrible register file and ignoring the belt almost completely.
> > >
> > > No, you would only spill values that other ALU's need. If an ALU gets too much state you would switch
> > > the next opcode group to a different ALU and spill values so the other ALU can pick the values up.
> > >
> > > In real world code dependancy chains are short and this would never happen.
> > > A typical use of eight ALU's is the loop counter and index pointer would sit in one
> > > ALU, index pointers in other ALU's, and actual compute in maybe two other ALU's.
> > >
> > > As you go through the loop you know what ALU's have the values from previous loops and you keep
> > > re-assigning those opcode chains to where those values and index register pointer are.
> > >
> > > Basically no data would ever spill to the scratchpad, only results get written.
> > >
> > > Rename is only needed to roll back writes that were caused by a mispredicted branch.
> >
> > No, rename is also needed to reorder.
>
> Not if you reset the belt size to zero between dependency batches, and spill the old
> belt for mispredicted branch recovery. Note I am assuming eight belts. The case of a
> unified belt and no belt resets is not much different, just harder to understand.
>
> The point of OoO is largely that indexing and loads can run ahead of other opcodes.
>
> I believe you are trying to rename belt positions, there is no need as belt positions are unique
> down one branch path. Holes in a belt due to OoO are irrelevant, they will get filled in later.
>
> The only renaming is on result writes to the scratchpad.
>
> I apologize in advance if I am missing something.
>
> > > > > This can radically reduce rename needs. The majority of ALU
> > > > > outputs do not need to be renamed with this approach.
> > > > >
> > > > > You do need to wait for all inputs to be available before issuing the group to an ALU.
> > > >
> > > > You are also forcing all outputs into the scratchpad which is
> > > > a) a terrible idea, because now we're back to registers, except wider and more expensive and
> > > > b) the complete opposite of what they tried to do with the mill.
> > >
> > > Better explanation above.
> > >
> > > > > > > The downside is you need a larger scratchpad, but the scratchpad
> > > > > > > scales far better than a multiported register file.
> > > > > >
> > > > > > How many ports do you think a register file got?
> > > > >
> > > > > The Pentium had two ports to the register file, but that was 1990's.
> > > > >
> > > > > The gold Mill has eight ALU's and multiplexing that to one memory array
> > > > > gets difficult and takes more space than independent memory arrays.
> > > >
> > > > The multiplexing gets really awkward and now you're limited by the read ports on your
> > > > register file / scratchpad partitions. Pray that a value isn't needed more than once
> > > > per cycle or that the compiler will just magically make all problems go away.
> > >
> > > Better explanation above and below.
> > >
> > > > > You still need a multiplexer to exchange data between ALU's, but that is
> > > > > less in bandwidth needs than sending all the data needlessly like now.
> > > > >
> > > > > > > There is also memory renaming, sometimes called store-load forwarding, but that is
> > > > > > > an in-order term. If you are already renaming the scratchpad this works fine, a cache
> > > > > > > line store would access all eight renamers to grab the eight longs of the line.
> > > > > > >
> > > > > > > It takes a while to get your head around what is happening because
> > > > > > > it is not RISC, but I do not see any show stoppers so far.
> > > > > > Belt encoding completely kills it.
> > > > > > Instead of having to update what each register "means" only for each writing operation you have to do
> > > > > > it for every belt position, every cycle, with information reaching I don't know how many cycles into
> > > > > > the past to update the belt positions with the names of the operation results that were expected to finish
> > > > > > in that cycle. 32 wide rename with definitely >5 cycles of history having to be saved as well?
> > > > > > Yeah, good luck.
> > > > >
> > > > > I covered this above.
> > > >
> > > > You did not.
> > > > To resume execution after a mispredict belt positions must point to the correct data.
> > > > So either no branches can execute while any belt is in use or you must save the data.
> > > > And even just for reordering you need renaming. Or can anything that accesses the belt not
> > > > be reordered? So you're forcing everything through the scratchpad again, building the world's
> > > > worst load/store architecture, because now every dependency chain at best and every single
> > > > instruction at worst needs an extra instruction for every input and every output.
> > >
> > > Yes, you need another port on the scratchpad to spill the belt,
> > > to deal with mispredicted branches and reload old belt values.
> > >
> >
> > You are still ignoring the question.
> > How do you reorder?
>
> Belt positions are unique.
They are not. The belt "moves". So the same position points to a different value depending on the cycle. Therefore you can't change the order of execution unless you rename.
>
> > How do you recover from a mispredict.
>
> Roll back the belt, and any writes.
>
And how do you do that, if you haven't saved?
> > Any changes in execution order and values be not in the same positions on the belt.
>
> No, belt positions are assigned at decode. Trying to assign
> belt positions at execution time would not work.
>
Belt "positions" are assigned at compile time. Based on when the previous instructions finish.
The position of a result is implicit. Again, if the execution order changes the result does not end up in the same position.
> > > > > This should give you a better idea of the post RISC ideas I am looking at.
> > > >
> > > > You're just trying to turn the Mill into a RISC architecture again and it's simply a terrible idea.
> > > >
> > > > > > Using the spiller to do speculation should be doable, as mentioned before, but OoOE is impossible.
> > > > > >
> > > > > > > By the way I live in Austin, TX and that is one of my emails listed in the By field.
>
>
> anon (spam.delete.delete@this.this.spam.com) on May 8, 2017 1:39 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on May 7, 2017 8:57 pm wrote:
> > > anon (spam.delete.delete@this.this.spam.com) on May 7, 2017 5:47 pm wrote:
> > > > Brett (ggtgp.delete@this.yahoo.com) on May 7, 2017 1:56 pm wrote:
> > > > > anon (spam.delete.delete@this.this.spam.com) on May 6, 2017 4:41 pm wrote:
> > > > > > Brett (ggtgp.delete@this.yahoo.com) on May 6, 2017 3:16 pm wrote:
> > > > > > > anon (spam.delete.delete@this.this.spam.com) on April 21, 2017 1:19 pm wrote:
> > > > > > > > NoSpammer (no.delete@this.spam.com) on April 21, 2017 12:40 pm wrote:
> > > > > > > > > anon (spam.delete.delete@this.this.spam.com) on April 21, 2017 7:55 am wrote:
> > > > > > > > > > Like I said, if they have a plan on how to deal with the things
> > > > > > > > > > that usually kill VLIW it can work, if they don't it won't.
> > > > > > > > >
> > > > > > > > > It doesn't matter if they have a plan. Where's the solution?
> > > > > > > >
> > > > > > > > Do you think a massively simplified version will put up numbers that'll impress anyone?
> > > > > > > > Yes, if they've never verified that their concept works they'll have a problem.
> > > > > > > >
> > > > > > > > > The alternative of course is you pretend all is great, go looking for naive investors,
> > > > > > > > > and while your are at it you go to RWT to try to make some hype and have several anonymous
> > > > > > > > > people talking favorably of it to maybe convince some non believers.
> > > > > > > > >
> > > > > > > > > If there's no compiler and no emulator in any form of even
> > > > > > > > > something architecturally similar to what you are
> > > > > > > > > selling after so many years, it's not difficult to conclude
> > > > > > > > > the game is about striking luck with some patents.
> > > > > > > >
> > > > > > > > http://millcomputing.com/topic/simulation/#post-1547
> > > > > > > > They have something, which they have probably shown investors.
> > > > > > > > Doesn't mean they have to show everyone their cards yet.
> > > > > > > >
> > > > > > > > So they're trying to make money by patenting something that doesn't even work?
> > > > > > > >
> > > > > > > > This seems like the highest effort, lowest reward (none) scam if you are right.
> > > > > > >
> > > > > > > Everyone seems to assume that the Mill is in-order, as do I, but
> > > > > > > how hard is it to scale this up to an out of order design?
> > > > > >
> > > > > > Basically impossible.
> > > > >
> > > > > Answers below.
> > > > >
> > > > > > > First you have to save the belt, but I assume each ALU has its own port to its own part of the
> > > > > > > scratchpad.
> > > > > > That would waste a lot of space both in terms of unused
> > > > > > capacity and implementation. Also how would you write
> > > > > > into the scratchpad from anything that's not an ALU. Even just the ALUs would mean 8 ports on a Gold.
> > > > >
> > > > > Yes you would waste some space with eight scratchpads. But the front end (and compiler) would
> > > > > load balance so that one ALU would not get too overloaded and fill its scratchpad.
> > > > >
> > > >
> > > > So classic hand wave "the compiler will take care of it".
> > >
> > > The compiler would not do a great job outside of loops, but one
> > > would not rely on the compiler for more than indirect hints.
> > >
> > > > > > > The namers would be ALU based, with only occasional accesses to other ALU's.
> > > > > > So you want OoOE but with static scheduling? Or you want to do the full OoO dependency analysis
> > > > > > and "optimal" scheduling so it's identical to current OoOE machines except wider?
> > > > >
> > > > > For this thought exercise I am assuming eight belts, running groups of opcodes, and only
> > > > > needing to communicating inputs, and net outputs to the scratchpad, not belt positions.
> > > >
> > > > So you're just using the scratchpad as a terrible register file and ignoring the belt almost completely.
> > >
> > > No, you would only spill values that other ALU's need. If an ALU gets too much state you would switch
> > > the next opcode group to a different ALU and spill values so the other ALU can pick the values up.
> > >
> > > In real world code dependancy chains are short and this would never happen.
> > > A typical use of eight ALU's is the loop counter and index pointer would sit in one
> > > ALU, index pointers in other ALU's, and actual compute in maybe two other ALU's.
> > >
> > > As you go through the loop you know what ALU's have the values from previous loops and you keep
> > > re-assigning those opcode chains to where those values and index register pointer are.
> > >
> > > Basically no data would ever spill to the scratchpad, only results get written.
> > >
> > > Rename is only needed to roll back writes that were caused by a mispredicted branch.
> >
> > No, rename is also needed to reorder.
>
> Not if you reset the belt size to zero between dependency batches, and spill the old
> belt for mispredicted branch recovery. Note I am assuming eight belts. The case of a
> unified belt and no belt resets is not much different, just harder to understand.
>
> The point of OoO is largely that indexing and loads can run ahead of other opcodes.
>
> I believe you are trying to rename belt positions, there is no need as belt positions are unique
> down one branch path. Holes in a belt due to OoO are irrelevant, they will get filled in later.
>
> The only renaming is on result writes to the scratchpad.
>
> I apologize in advance if I am missing something.
>
> > > > > This can radically reduce rename needs. The majority of ALU
> > > > > outputs do not need to be renamed with this approach.
> > > > >
> > > > > You do need to wait for all inputs to be available before issuing the group to an ALU.
> > > >
> > > > You are also forcing all outputs into the scratchpad which is
> > > > a) a terrible idea, because now we're back to registers, except wider and more expensive and
> > > > b) the complete opposite of what they tried to do with the mill.
> > >
> > > Better explanation above.
> > >
> > > > > > > The downside is you need a larger scratchpad, but the scratchpad
> > > > > > > scales far better than a multiported register file.
> > > > > >
> > > > > > How many ports do you think a register file got?
> > > > >
> > > > > The Pentium had two ports to the register file, but that was 1990's.
> > > > >
> > > > > The gold Mill has eight ALU's and multiplexing that to one memory array
> > > > > gets difficult and takes more space than independent memory arrays.
> > > >
> > > > The multiplexing gets really awkward and now you're limited by the read ports on your
> > > > register file / scratchpad partitions. Pray that a value isn't needed more than once
> > > > per cycle or that the compiler will just magically make all problems go away.
> > >
> > > Better explanation above and below.
> > >
> > > > > You still need a multiplexer to exchange data between ALU's, but that is
> > > > > less in bandwidth needs than sending all the data needlessly like now.
> > > > >
> > > > > > > There is also memory renaming, sometimes called store-load forwarding, but that is
> > > > > > > an in-order term. If you are already renaming the scratchpad this works fine, a cache
> > > > > > > line store would access all eight renamers to grab the eight longs of the line.
> > > > > > >
> > > > > > > It takes a while to get your head around what is happening because
> > > > > > > it is not RISC, but I do not see any show stoppers so far.
> > > > > > Belt encoding completely kills it.
> > > > > > Instead of having to update what each register "means" only for each writing operation you have to do
> > > > > > it for every belt position, every cycle, with information reaching I don't know how many cycles into
> > > > > > the past to update the belt positions with the names of the operation results that were expected to finish
> > > > > > in that cycle. 32 wide rename with definitely >5 cycles of history having to be saved as well?
> > > > > > Yeah, good luck.
> > > > >
> > > > > I covered this above.
> > > >
> > > > You did not.
> > > > To resume execution after a mispredict belt positions must point to the correct data.
> > > > So either no branches can execute while any belt is in use or you must save the data.
> > > > And even just for reordering you need renaming. Or can anything that accesses the belt not
> > > > be reordered? So you're forcing everything through the scratchpad again, building the world's
> > > > worst load/store architecture, because now every dependency chain at best and every single
> > > > instruction at worst needs an extra instruction for every input and every output.
> > >
> > > Yes, you need another port on the scratchpad to spill the belt,
> > > to deal with mispredicted branches and reload old belt values.
> > >
> >
> > You are still ignoring the question.
> > How do you reorder?
>
> Belt positions are unique.
They are not. The belt "moves". So the same position points to a different value depending on the cycle. Therefore you can't change the order of execution unless you rename.
>
> > How do you recover from a mispredict.
>
> Roll back the belt, and any writes.
>
And how do you do that, if you haven't saved?
> > Any changes in execution order and values be not in the same positions on the belt.
>
> No, belt positions are assigned at decode. Trying to assign
> belt positions at execution time would not work.
>
Belt "positions" are assigned at compile time. Based on when the previous instructions finish.
The position of a result is implicit. Again, if the execution order changes the result does not end up in the same position.
> > > > > This should give you a better idea of the post RISC ideas I am looking at.
> > > >
> > > > You're just trying to turn the Mill into a RISC architecture again and it's simply a terrible idea.
> > > >
> > > > > > Using the spiller to do speculation should be doable, as mentioned before, but OoOE is impossible.
> > > > > >
> > > > > > > By the way I live in Austin, TX and that is one of my emails listed in the By field.
>
>