By: , May 31, 2013 10:59 pm
Room: Moderated Discussions
rwessel (robertwessel.delete@this.yahoo.com) on May 31, 2013 9:53 pm wrote:
> The operand itself would never go through the AGU, rather the generated address is passed to the load/store
> unit (often the AGU is part of that), and once the load completes, the operand is forwarded to the execution
> unit where the dispatched instruction is waiting for it. Exactly how that happens depends a great deal
> on the microarchitecture. On a simple in-order design the pipeline may simply be stalled waiting for the
> load unit to present the single outstanding operand. In an OoO design, a rather more complicated network
> will exist to get the operand to the instruction (micro-op) needing it, where ever that's happened to end
> up waiting. Once the instruction has all of its operands, it can then be executed.
>
> Things are a bit different on the store side, as the operand is usually available pretty early (it's a store,
> after all, you already have the operand), and the store, along with the address, can but pushed quickly into
> the store buffer, and that can then complete independently of the instruction stream. A critical issue is maintaining
> a coherent and properly sequential view of memory even if the actual (physical) stores and loads are not happening
> in the architected order. It's not so bad within a single processor, since the store buffer can watch for other
> memory accesses, and jump in when it has a pending store. But since memory accesses are visible to other processors
> (and I/O devices), great effort must be taken to ensure that those other devices only see memory accesses in
> the architected order, or you'll break every multithreaded program in sight.
Thanks for the reply!
- So there is a load/store unit that is included in the AGU's in the diagrams? Operands flow through these units? That makes sense; but one thing: how do these load/store units get the operands to the execution units? Are they directly linked or do they go through one of the other scheduler/buffers above?
- Ah yes, stores seem to be rather straight forward considering the reasons you stated. Though just like the question above; how does the finished result of the instruction get to the store unit to store the result? Directly linked? Or another method?
Thanks as always for being very helpful and informative to me!
> The operand itself would never go through the AGU, rather the generated address is passed to the load/store
> unit (often the AGU is part of that), and once the load completes, the operand is forwarded to the execution
> unit where the dispatched instruction is waiting for it. Exactly how that happens depends a great deal
> on the microarchitecture. On a simple in-order design the pipeline may simply be stalled waiting for the
> load unit to present the single outstanding operand. In an OoO design, a rather more complicated network
> will exist to get the operand to the instruction (micro-op) needing it, where ever that's happened to end
> up waiting. Once the instruction has all of its operands, it can then be executed.
>
> Things are a bit different on the store side, as the operand is usually available pretty early (it's a store,
> after all, you already have the operand), and the store, along with the address, can but pushed quickly into
> the store buffer, and that can then complete independently of the instruction stream. A critical issue is maintaining
> a coherent and properly sequential view of memory even if the actual (physical) stores and loads are not happening
> in the architected order. It's not so bad within a single processor, since the store buffer can watch for other
> memory accesses, and jump in when it has a pending store. But since memory accesses are visible to other processors
> (and I/O devices), great effort must be taken to ensure that those other devices only see memory accesses in
> the architected order, or you'll break every multithreaded program in sight.
Thanks for the reply!
- So there is a load/store unit that is included in the AGU's in the diagrams? Operands flow through these units? That makes sense; but one thing: how do these load/store units get the operands to the execution units? Are they directly linked or do they go through one of the other scheduler/buffers above?
- Ah yes, stores seem to be rather straight forward considering the reasons you stated. Though just like the question above; how does the finished result of the instruction get to the store unit to store the result? Directly linked? Or another method?
Thanks as always for being very helpful and informative to me!