By: , May 31, 2013 1:22 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 31, 2013 12:22 pm wrote:
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
>
> > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > Where would the result of the virtual address be stored, and how would it be used?
>
> Usually, the virtual address calculated by the AGU is forwarded directly
> to the DTLB and L1 D$ to be used as an address for a store/load.
>
> It's also possible to store the address calculated by the AGU into a register for other uses.
>
> > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > one of each using one thread, how does a single execution unit perform as two?
>
> For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> software runs at ~1 µOP per clock, due to instruction dependencies and other things.
>
> There is thus, lots of free time to execute instructions for a second thread.
>
>
Thanks again for the reply!
- Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding is correct now; The scheduler gives a location for the AGU to decode, which can either be simple (EDX1) or complex (EDX2+2/5^65%4), and once it figures out what the virtual address of this location is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds where the data is, and then requests it from wherever it is through the caches into the data cache for the execution units to utilize? Hopefully I've gotten it right by now...
- Ah, so it simply gives full priority to a second thread to utilize unused execution units? If so; can you please explain as to why some applications that ARE indeed multi-threaded, but still don't benefit from multi-threading? It's weird how I find some game benchmarks to either take very little/none at all/a performance HIT with hyperthreading enabled, when it's for sure that they use multiple threads...
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 31, 2013 6:59 am wrote:
>
> > The array example definitely helped. Though one more about AGU's; say a AGU is given an
> > instruction to calculate the virtual address of (array+4), and it does so successfully.
> > Where would the result of the virtual address be stored, and how would it be used?
>
> Usually, the virtual address calculated by the AGU is forwarded directly
> to the DTLB and L1 D$ to be used as an address for a store/load.
>
> It's also possible to store the address calculated by the AGU into a register for other uses.
>
> > - Sorry, I still don't quite understand how multi-threading works. If there are two programs,
> > one of each using one thread, how does a single execution unit perform as two?
>
> For example, while an Ivy Bridge CPU can in theory sustain 6 µOPs per clock, in practice most
> software runs at ~1 µOP per clock, due to instruction dependencies and other things.
>
> There is thus, lots of free time to execute instructions for a second thread.
>
>
Thanks again for the reply!
- Oh, so the AGU is sort of like a "decoder" for store and load operations? So hopefully my understanding is correct now; The scheduler gives a location for the AGU to decode, which can either be simple (EDX1) or complex (EDX2+2/5^65%4), and once it figures out what the virtual address of this location is, it sends the request to the DTLB, which performs a look up of this virtual address, which finds where the data is, and then requests it from wherever it is through the caches into the data cache for the execution units to utilize? Hopefully I've gotten it right by now...
- Ah, so it simply gives full priority to a second thread to utilize unused execution units? If so; can you please explain as to why some applications that ARE indeed multi-threaded, but still don't benefit from multi-threading? It's weird how I find some game benchmarks to either take very little/none at all/a performance HIT with hyperthreading enabled, when it's for sure that they use multiple threads...