By: , July 31, 2013 9:12 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 31, 2013 4:11 pm wrote:
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on July 31, 2013 2:15 pm wrote:
> [snip]
>
> > - The TLB lies in the store components of a core. (I know you said one TLB per CPU... but then
> > you said that the TLB lies in the store components, so you must've meant core, correct?)
>
> To get an idea of where TLBs are physically located, you can take a look at CPU floorplans
> (doing a web image search for "CPU core floorplan" should provide some interesting viewing).
> Hans de Vries has a number of such images (and some articles with analysis). The following
> is probably a helpful example: http://www.chip-architect.com/news/K8L_floorplan.jpg
>
> > - There is not multiple TLBs (like I thought) unless they are tertiary TLBs
> > to the primary TLB, or a microTLB to help sort out the primary TLB.
>
> TLBs are not associated with levels of the (ordinary) cache, but like cache can be divided into multiple levels
> and be shared or private between instruction and data accesses (and even between different cores).
>
> > - The TLB is just a cache of the last acceses to information; their translations and permissions.
>
> Yep! The "authoritative" source for such information is the page table(s) and TLBs are caches of page table
> information. (Intel calls the structure which stores internal node entries in its multi-level page table Paging-Structure
> Caches, so Intel might view a TLB as only holding Page Table Entries and not any Page Directory Entries
> and other parts of the hierarchical page table. Since Intel has not implemented a structure which caches both
> types of page table components, it is not obvious what Intel would call such a hardware structure. I seem to
> recall Robert Wessel [on the comp.arch newsgroup, I think] mentioning some time ago that some implementation
> of IBM mainframes did have TLBs that cached more than just PTEs--and called them TLBs.)
>
> > - NO caches have a TLB solely for them; though all caches have a page table or multiple in them
>
> The L1 TLBs are somewhat tightly bound to the L1 caches in typical processors
> (that have TLBs) to allow minimal latency in L1 cache accesses. However, there
> is typically not a special TLB for the L2 cache nor another for the L3 cache.
>
> Page tables are conceptually stored in main memory, but like other parts of main memory can be found
> in caches. (Theoretically, pages of a page table can even be paged out to swap like other memory.)
>
> In a typical system, each software process has a separate virtual address space requiring a separate
> page table. (Threads within a process share address space/page table.) The OS typically takes a
> portion of each process' total address space as a virtual memory region shared across the system.
> Mechanisms exist to to avoid duplicating all of the information in this region (e.g., the ARM ISA
> defines a global address space with a mask determining what accesses are within the per-process
> address space and which are in the global address space). (Eep! Excess digression!)
>
> > - The TLB uses these page tables to track what is inside a cache/RAM and
> > uses the table to translate virtual addresses to physical addresses.
>
> The TLBs are (typically) not concerned with what is present in the caches and only provide information to translate
> virtual addresses to physical address and some other metadata like permissions, accessed and dirty bits, cache
> write-through/write-back behavior, etc. Each virtual address page (typically an aligned chunk of memory about
> 4KiB in size) is associated with a Page Table Entry (typically 64-bits in size for 64-bit address spaces). The
> page table entries are stored in the page table and the information is cached in the TLBs.
>
> A PTE can have its validity bit cleared to indicate that the page associated with the virtual
> address is not in memory, but it is the operating system software that handles such. (Some
> specifics of how the TLB handles invalid PTEs depend on the choices made by the developers
> of the architecture, but obviously any non-speculative access must generate an exception
> so that the OS can load the appropriate page and set the valid bit of the PTE.)
>
> > Did I... Get it right? I really hope so, this is going in
> > a completely different direction than I thought it would.
>
> It looks like you have got it mostly right.
>
> As with any area of knowledge, the more one learns, the more one discovers
> how much one does not know (and how interconnected knowledge is).
>
> (In my reading about computer architecture, I have concentrated somewhat on caches and TLBs because
> these areas seem to be somewhat more accessible--lack of circuit design or programming knowledge is
> not a major issue for reading most of the academic papers--, the way they impact performance is relatively
> straightforward, and they have a significant impact on performance and power efficiency.)
Ohhh so this is starting to make a bit more sense now; thanks for your explanations!
- So it really does look like the TLBs are physically located extremely close to the back-end of the CPU architecture. Interesting!
- So if the L1 TLB is tightly bound to the L1 Cache, does that mean that there is a seperate TLB that does the translation/permission caching for the memory structures L2 cache through main memory? Or does the L1 Cache do double (quadruple?) duty and store info for ALL memory structuers... The core floormap seems to have two TLBs; or atleast the core floormap that you provided previously (thanks again for that)
- Page tables are conceptually stored in main memory? Does that mean that the L2 and L3 cache do not have their own page table and must be physically walked through EVERY time something wants to be accessed in them? That seems needlessly inefficient; I gotta be missing something here.
Thanks again for all the help. I definitely know what you mean by "the more one learns, the more one discovers how much he does not know"; I'm having a major case of that now. I'm trying to learn as much of this stuff as early as I can as I one day want to be working directly with hardware in some way; so RWT has been an absolutely incredible resource to me! Thanks to you all!
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on July 31, 2013 2:15 pm wrote:
> [snip]
>
> > - The TLB lies in the store components of a core. (I know you said one TLB per CPU... but then
> > you said that the TLB lies in the store components, so you must've meant core, correct?)
>
> To get an idea of where TLBs are physically located, you can take a look at CPU floorplans
> (doing a web image search for "CPU core floorplan" should provide some interesting viewing).
> Hans de Vries has a number of such images (and some articles with analysis). The following
> is probably a helpful example: http://www.chip-architect.com/news/K8L_floorplan.jpg
>
> > - There is not multiple TLBs (like I thought) unless they are tertiary TLBs
> > to the primary TLB, or a microTLB to help sort out the primary TLB.
>
> TLBs are not associated with levels of the (ordinary) cache, but like cache can be divided into multiple levels
> and be shared or private between instruction and data accesses (and even between different cores).
>
> > - The TLB is just a cache of the last acceses to information; their translations and permissions.
>
> Yep! The "authoritative" source for such information is the page table(s) and TLBs are caches of page table
> information. (Intel calls the structure which stores internal node entries in its multi-level page table Paging-Structure
> Caches, so Intel might view a TLB as only holding Page Table Entries and not any Page Directory Entries
> and other parts of the hierarchical page table. Since Intel has not implemented a structure which caches both
> types of page table components, it is not obvious what Intel would call such a hardware structure. I seem to
> recall Robert Wessel [on the comp.arch newsgroup, I think] mentioning some time ago that some implementation
> of IBM mainframes did have TLBs that cached more than just PTEs--and called them TLBs.)
>
> > - NO caches have a TLB solely for them; though all caches have a page table or multiple in them
>
> The L1 TLBs are somewhat tightly bound to the L1 caches in typical processors
> (that have TLBs) to allow minimal latency in L1 cache accesses. However, there
> is typically not a special TLB for the L2 cache nor another for the L3 cache.
>
> Page tables are conceptually stored in main memory, but like other parts of main memory can be found
> in caches. (Theoretically, pages of a page table can even be paged out to swap like other memory.)
>
> In a typical system, each software process has a separate virtual address space requiring a separate
> page table. (Threads within a process share address space/page table.) The OS typically takes a
> portion of each process' total address space as a virtual memory region shared across the system.
> Mechanisms exist to to avoid duplicating all of the information in this region (e.g., the ARM ISA
> defines a global address space with a mask determining what accesses are within the per-process
> address space and which are in the global address space). (Eep! Excess digression!)
>
> > - The TLB uses these page tables to track what is inside a cache/RAM and
> > uses the table to translate virtual addresses to physical addresses.
>
> The TLBs are (typically) not concerned with what is present in the caches and only provide information to translate
> virtual addresses to physical address and some other metadata like permissions, accessed and dirty bits, cache
> write-through/write-back behavior, etc. Each virtual address page (typically an aligned chunk of memory about
> 4KiB in size) is associated with a Page Table Entry (typically 64-bits in size for 64-bit address spaces). The
> page table entries are stored in the page table and the information is cached in the TLBs.
>
> A PTE can have its validity bit cleared to indicate that the page associated with the virtual
> address is not in memory, but it is the operating system software that handles such. (Some
> specifics of how the TLB handles invalid PTEs depend on the choices made by the developers
> of the architecture, but obviously any non-speculative access must generate an exception
> so that the OS can load the appropriate page and set the valid bit of the PTE.)
>
> > Did I... Get it right? I really hope so, this is going in
> > a completely different direction than I thought it would.
>
> It looks like you have got it mostly right.
>
> As with any area of knowledge, the more one learns, the more one discovers
> how much one does not know (and how interconnected knowledge is).
>
> (In my reading about computer architecture, I have concentrated somewhat on caches and TLBs because
> these areas seem to be somewhat more accessible--lack of circuit design or programming knowledge is
> not a major issue for reading most of the academic papers--, the way they impact performance is relatively
> straightforward, and they have a significant impact on performance and power efficiency.)
Ohhh so this is starting to make a bit more sense now; thanks for your explanations!
- So it really does look like the TLBs are physically located extremely close to the back-end of the CPU architecture. Interesting!
- So if the L1 TLB is tightly bound to the L1 Cache, does that mean that there is a seperate TLB that does the translation/permission caching for the memory structures L2 cache through main memory? Or does the L1 Cache do double (quadruple?) duty and store info for ALL memory structuers... The core floormap seems to have two TLBs; or atleast the core floormap that you provided previously (thanks again for that)
- Page tables are conceptually stored in main memory? Does that mean that the L2 and L3 cache do not have their own page table and must be physically walked through EVERY time something wants to be accessed in them? That seems needlessly inefficient; I gotta be missing something here.
Thanks again for all the help. I definitely know what you mean by "the more one learns, the more one discovers how much he does not know"; I'm having a major case of that now. I'm trying to learn as much of this stuff as early as I can as I one day want to be working directly with hardware in some way; so RWT has been an absolutely incredible resource to me! Thanks to you all!