By: rwessel (robertwessel.delete@this.yahoo.com), June 3, 2013 10:51 am
Room: Moderated Discussions
Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on June 3, 2013 9:59 am wrote:
> rwessel (robertwessel.delete@this.yahoo.com) on June 3, 2013 12:09 am wrote:
> > The load/store units *are* types of execution units - they happen to handle loads and stores to memory,
> > rather, that, say, arithmetic operations. Instructions get to them because the dispatcher sends
> > them there. The store unit gets operands just like any other execution unit does, and the load unit
> > sends results to registers, just like and arithmetic unit that generates results does.
> >
> > Nominally, when an instruction is issued OoO, and before its operands are ready, what it's waiting on is a
> > prior instruction to write its result to a register, which
> > it can then read from the register file. The many
> > rename registers on modern OoO processors are part of the
> > mechanism to be able to execute around dependencies
> > caused by register reuse. But even that isn't quite enough, the path between a result being stored into a
> > register and a dependent instruction reading that result from the register is far too long in most cases,
> > and most fast processors (and not just OoO ones), implement a forwarding network that allows results to be
> > transmitted from one unit to another directly, and in parallel to the update of the register.
> >
> > In the case of non-RISC machines, the operation of the load and store units is complicated by the fact that
> > there are operations that are read-modify-update in nature.
> > Exactly how those are handled is very dependent
> > on the microarchitecture, but the earlier implementation
> > all broke operations like "add memory,1" into several
> > micro-ops (perhaps a load, an add and a store), more complex
> > designs can compress that into fewer operations.
> > Instructions that get split into multiple micro-ops need special
> > handling on the back end, as you cannot architecturally
> > take an exception in the “middle” of an instruction (and least on most machines).
>
> Thanks for yet another great post! Sorry for replying late, was a bit busy the last couple days.
>
> - So load and store units also are an execution unit. Thanks, that makes sense, as it would be weird to
> imagine a unit to just have things "pulled" to it with no controller to send or request resources.
>
> - Well that makes sense, but what about in cases that two processes are not dependant on eachother whatsoever?
> What about in a situation wehre two threads are working on the same core, and one thread needs operand
> [b] and the other thread needs operand [k], and they have nothing to do with eachother? How does the load
> unit load the required operands into the registers? Whats the path, is what I'm asking?
With most SMT implementations (and other forms of multi-threading vary a bit), the load/store units don't really care much about threads. The dispatch unit will have assigned a rename register to the operation (and that rename register will *not* be used simultaneously in any other thread - an instruction using the same architected register in another thread would use a different physical register), and the load/store (or any other execution unit) will just use the assigned rename (aka physical) register.
> rwessel (robertwessel.delete@this.yahoo.com) on June 3, 2013 12:09 am wrote:
> > The load/store units *are* types of execution units - they happen to handle loads and stores to memory,
> > rather, that, say, arithmetic operations. Instructions get to them because the dispatcher sends
> > them there. The store unit gets operands just like any other execution unit does, and the load unit
> > sends results to registers, just like and arithmetic unit that generates results does.
> >
> > Nominally, when an instruction is issued OoO, and before its operands are ready, what it's waiting on is a
> > prior instruction to write its result to a register, which
> > it can then read from the register file. The many
> > rename registers on modern OoO processors are part of the
> > mechanism to be able to execute around dependencies
> > caused by register reuse. But even that isn't quite enough, the path between a result being stored into a
> > register and a dependent instruction reading that result from the register is far too long in most cases,
> > and most fast processors (and not just OoO ones), implement a forwarding network that allows results to be
> > transmitted from one unit to another directly, and in parallel to the update of the register.
> >
> > In the case of non-RISC machines, the operation of the load and store units is complicated by the fact that
> > there are operations that are read-modify-update in nature.
> > Exactly how those are handled is very dependent
> > on the microarchitecture, but the earlier implementation
> > all broke operations like "add memory,1" into several
> > micro-ops (perhaps a load, an add and a store), more complex
> > designs can compress that into fewer operations.
> > Instructions that get split into multiple micro-ops need special
> > handling on the back end, as you cannot architecturally
> > take an exception in the “middle” of an instruction (and least on most machines).
>
> Thanks for yet another great post! Sorry for replying late, was a bit busy the last couple days.
>
> - So load and store units also are an execution unit. Thanks, that makes sense, as it would be weird to
> imagine a unit to just have things "pulled" to it with no controller to send or request resources.
>
> - Well that makes sense, but what about in cases that two processes are not dependant on eachother whatsoever?
> What about in a situation wehre two threads are working on the same core, and one thread needs operand
> [b] and the other thread needs operand [k], and they have nothing to do with eachother? How does the load
> unit load the required operands into the registers? Whats the path, is what I'm asking?
With most SMT implementations (and other forms of multi-threading vary a bit), the load/store units don't really care much about threads. The dispatch unit will have assigned a rename register to the operation (and that rename register will *not* be used simultaneously in any other thread - an instruction using the same architected register in another thread would use a different physical register), and the load/store (or any other execution unit) will just use the assigned rename (aka physical) register.