By: David Kanter (dkanter.delete@this.realworldtech.com), May 31, 2013 3:08 pm
Room: Moderated Discussions
Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 2:22 pm wrote:
> Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 31, 2013 12:22 pm wrote:
> > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
> >
> > > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > > Where would the result of the virtual address be stored, and how would it be used?
> >
> > Usually, the virtual address calculated by the AGU is forwarded directly
> > to the DTLB and L1 D$ to be used as an address for a store/load.
> >
> > It's also possible to store the address calculated by the AGU into a register for other uses.
> >
> > > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > > one of each using one thread, how does a single execution unit perform as two?
> >
> > For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> > software runs at ~1 µOP per clock, due to instruction dependencies and other things.
> >
> > There is thus, lots of free time to execute instructions for a second thread.
> >
> >
>
> Thanks again for the reply!
>
> - Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding
> is correct now;
It's not a decoder. It calculates the address, based on the instructions used in the code. Instructions have very different addressing modes, and the AGU needs to be able to handle all of them. Most address calculations are simple, but they can be quite complex, involving multiplication and addition.
> The scheduler gives a location for the AGU to decode, which can either be simple
> (EDX1) or complex (EDX2+2/5^65%4),
It's specified by an instruction, not the scheduler. It might get held in the scheduler temporarily. Honestly, you might find it beneficial to read some of my other articles, such as http://www.realworldtech.com/barcelona/ they have a good narration of the pipeline.
>and once it figures out what the virtual address of this location
> is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds
> where the data is,
That's correct, the DTLB converts from virtual to physical addresses.
>and then requests it from wherever it is through the caches into the data cache
> for the execution units to utilize? Hopefully I've gotten it right by now...
It requests that address from the cache and loads it into a register. Which is why we call it a load.
Stores work rather differently, since it is moving data from a register to the memory.
David
> Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 31, 2013 12:22 pm wrote:
> > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
> >
> > > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > > Where would the result of the virtual address be stored, and how would it be used?
> >
> > Usually, the virtual address calculated by the AGU is forwarded directly
> > to the DTLB and L1 D$ to be used as an address for a store/load.
> >
> > It's also possible to store the address calculated by the AGU into a register for other uses.
> >
> > > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > > one of each using one thread, how does a single execution unit perform as two?
> >
> > For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> > software runs at ~1 µOP per clock, due to instruction dependencies and other things.
> >
> > There is thus, lots of free time to execute instructions for a second thread.
> >
> >
>
> Thanks again for the reply!
>
> - Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding
> is correct now;
It's not a decoder. It calculates the address, based on the instructions used in the code. Instructions have very different addressing modes, and the AGU needs to be able to handle all of them. Most address calculations are simple, but they can be quite complex, involving multiplication and addition.
> The scheduler gives a location for the AGU to decode, which can either be simple
> (EDX1) or complex (EDX2+2/5^65%4),
It's specified by an instruction, not the scheduler. It might get held in the scheduler temporarily. Honestly, you might find it beneficial to read some of my other articles, such as http://www.realworldtech.com/barcelona/ they have a good narration of the pipeline.
>and once it figures out what the virtual address of this location
> is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds
> where the data is,
That's correct, the DTLB converts from virtual to physical addresses.
>and then requests it from wherever it is through the caches into the data cache
> for the execution units to utilize? Hopefully I've gotten it right by now...
It requests that address from the cache and loads it into a register. Which is why we call it a load.
Stores work rather differently, since it is moving data from a register to the memory.
David