By: , May 31, 2013 9:04 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on May 31, 2013 3:08 pm wrote:
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 2:22 pm wrote:
> > Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 31, 2013 12:22 pm wrote:
> > > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
> > >
> > > > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > > > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > > > Where would the result of the virtual address be stored, and how would it be used?
> > >
> > > Usually, the virtual address calculated by the AGU is forwarded directly
> > > to the DTLB and L1 D$ to be used as an address for a store/load.
> > >
> > > It's also possible to store the address calculated by the AGU into a register for other uses.
> > >
> > > > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > > > one of each using one thread, how does a single execution unit perform as two?
> > >
> > > For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> > > software runs at ~1 µOP per clock, due to instruction dependencies and other things.
> > >
> > > There is thus, lots of free time to execute instructions for a second thread.
> > >
> > >
> >
> > Thanks again for the reply!
> >
> > - Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding
> > is correct now;
>
> It's not a decoder. It calculates the address, based on the instructions used in the code. Instructions
> have very different addressing modes, and the AGU needs to be able to handle all of them. Most address
> calculations are simple, but they can be quite complex, involving multiplication and addition.
>
> > The scheduler gives a location for the AGU to decode, which can either be simple
> > (EDX1) or complex (EDX2+2/5^65%4),
>
> It's specified by an instruction, not the scheduler. It might get held in the scheduler temporarily.
> Honestly, you might find it beneficial to read some of my other articles, such as http://www.realworldtech.com/barcelona/
> they have a good narration of the pipeline.
>
> >and once it figures out what the virtual address of this location
> > is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds
> > where the data is,
>
> That's correct, the DTLB converts from virtual to physical addresses.
>
> >and then requests it from wherever it is through the caches into the data cache
> > for the execution units to utilize? Hopefully I've gotten it right by now...
>
> It requests that address from the cache and loads it into a register. Which is why we call it a load.
>
> Stores work rather differently, since it is moving data from a register to the memory.
>
> David
Hello David, thanks for taking the time to reply to my post; I definitely appreciate it!
- I know that the AGU is not a decoder; though for some reason, my mind likes to think of the equation that the virtual addresses come in are "in code" and that the AGU clarifies where the destination really is. Though, to my understanding; it is really like an ALU that takes the request for a virtual address and then clarifies where this address is (not physically, but virtually.) Hopefully this is correct.
- Ah, to clarify what I was trying to say previously; I mean to say that the instruction declares what operands it needs, and requests the operands needed by the AGU, where the AGU calculates the requests for these operands, and then the DTLB looks up the virtual address to find out where the data is in physical memory, (and here is where my understanding gets shaky) which once the address is found (through whatever cache escalations is needed), the operands are sent through the load/store units, through the AGU and back into the scheduler for the instruction to be passed onto the ALU to calculate? Or atleast thats how Barcelona looks like; though I'm sure I'm wrong here; why would operands be sent through an AGU? The Silvermont die diagram seems to send the operands to the ROB which gets sent to it's appropriate schedulers; which makes sense to me, but I'm sure that is also wrong...
I guess what I am having trouble understanding now is the path from the physical address of the operand, all the way into the registers for the execution unit to work with...
Anyways, thanks as always! This website is really one of the most informative I've ever come across with extremely helpful and knowledgeable members. Hopefully one day I'll be able to contribute.
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 2:22 pm wrote:
> > Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 31, 2013 12:22 pm wrote:
> > > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
> > >
> > > > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > > > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > > > Where would the result of the virtual address be stored, and how would it be used?
> > >
> > > Usually, the virtual address calculated by the AGU is forwarded directly
> > > to the DTLB and L1 D$ to be used as an address for a store/load.
> > >
> > > It's also possible to store the address calculated by the AGU into a register for other uses.
> > >
> > > > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > > > one of each using one thread, how does a single execution unit perform as two?
> > >
> > > For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> > > software runs at ~1 µOP per clock, due to instruction dependencies and other things.
> > >
> > > There is thus, lots of free time to execute instructions for a second thread.
> > >
> > >
> >
> > Thanks again for the reply!
> >
> > - Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding
> > is correct now;
>
> It's not a decoder. It calculates the address, based on the instructions used in the code. Instructions
> have very different addressing modes, and the AGU needs to be able to handle all of them. Most address
> calculations are simple, but they can be quite complex, involving multiplication and addition.
>
> > The scheduler gives a location for the AGU to decode, which can either be simple
> > (EDX1) or complex (EDX2+2/5^65%4),
>
> It's specified by an instruction, not the scheduler. It might get held in the scheduler temporarily.
> Honestly, you might find it beneficial to read some of my other articles, such as http://www.realworldtech.com/barcelona/
> they have a good narration of the pipeline.
>
> >and once it figures out what the virtual address of this location
> > is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds
> > where the data is,
>
> That's correct, the DTLB converts from virtual to physical addresses.
>
> >and then requests it from wherever it is through the caches into the data cache
> > for the execution units to utilize? Hopefully I've gotten it right by now...
>
> It requests that address from the cache and loads it into a register. Which is why we call it a load.
>
> Stores work rather differently, since it is moving data from a register to the memory.
>
> David
Hello David, thanks for taking the time to reply to my post; I definitely appreciate it!
- I know that the AGU is not a decoder; though for some reason, my mind likes to think of the equation that the virtual addresses come in are "in code" and that the AGU clarifies where the destination really is. Though, to my understanding; it is really like an ALU that takes the request for a virtual address and then clarifies where this address is (not physically, but virtually.) Hopefully this is correct.
- Ah, to clarify what I was trying to say previously; I mean to say that the instruction declares what operands it needs, and requests the operands needed by the AGU, where the AGU calculates the requests for these operands, and then the DTLB looks up the virtual address to find out where the data is in physical memory, (and here is where my understanding gets shaky) which once the address is found (through whatever cache escalations is needed), the operands are sent through the load/store units, through the AGU and back into the scheduler for the instruction to be passed onto the ALU to calculate? Or atleast thats how Barcelona looks like; though I'm sure I'm wrong here; why would operands be sent through an AGU? The Silvermont die diagram seems to send the operands to the ROB which gets sent to it's appropriate schedulers; which makes sense to me, but I'm sure that is also wrong...
I guess what I am having trouble understanding now is the path from the physical address of the operand, all the way into the registers for the execution unit to work with...
Anyways, thanks as always! This website is really one of the most informative I've ever come across with extremely helpful and knowledgeable members. Hopefully one day I'll be able to contribute.