By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), July 31, 2013 3:11 pm
Room: Moderated Discussions
Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on July 31, 2013 2:15 pm wrote:
[snip]
> - The TLB lies in the store components of a core. (I know you said one TLB per CPU... but then
> you said that the TLB lies in the store components, so you must've meant core, correct?)
To get an idea of where TLBs are physically located, you can take a look at CPU floorplans (doing a web image search for "CPU core floorplan" should provide some interesting viewing). Hans de Vries has a number of such images (and some articles with analysis). The following is probably a helpful example: http://www.chip-architect.com/news/K8L_floorplan.jpg
> - There is not multiple TLBs (like I thought) unless they are tertiary TLBs
> to the primary TLB, or a microTLB to help sort out the primary TLB.
TLBs are not associated with levels of the (ordinary) cache, but like cache can be divided into multiple levels and be shared or private between instruction and data accesses (and even between different cores).
> - The TLB is just a cache of the last acceses to information; their translations and permissions.
Yep! The "authoritative" source for such information is the page table(s) and TLBs are caches of page table information. (Intel calls the structure which stores internal node entries in its multi-level page table Paging-Structure Caches, so Intel might view a TLB as only holding Page Table Entries and not any Page Directory Entries and other parts of the hierarchical page table. Since Intel has not implemented a structure which caches both types of page table components, it is not obvious what Intel would call such a hardware structure. I seem to recall Robert Wessel [on the comp.arch newsgroup, I think] mentioning some time ago that some implementation of IBM mainframes did have TLBs that cached more than just PTEs--and called them TLBs.)
> - NO caches have a TLB solely for them; though all caches have a page table or multiple in them
The L1 TLBs are somewhat tightly bound to the L1 caches in typical processors (that have TLBs) to allow minimal latency in L1 cache accesses. However, there is typically not a special TLB for the L2 cache nor another for the L3 cache.
Page tables are conceptually stored in main memory, but like other parts of main memory can be found in caches. (Theoretically, pages of a page table can even be paged out to swap like other memory.)
In a typical system, each software process has a separate virtual address space requiring a separate page table. (Threads within a process share address space/page table.) The OS typically takes a portion of each process' total address space as a virtual memory region shared across the system. Mechanisms exist to to avoid duplicating all of the information in this region (e.g., the ARM ISA defines a global address space with a mask determining what accesses are within the per-process address space and which are in the global address space). (Eep! Excess digression!)
> - The TLB uses these page tables to track what is inside a cache/RAM and
> uses the table to translate virtual addresses to physical addresses.
The TLBs are (typically) not concerned with what is present in the caches and only provide information to translate virtual addresses to physical address and some other metadata like permissions, accessed and dirty bits, cache write-through/write-back behavior, etc. Each virtual address page (typically an aligned chunk of memory about 4KiB in size) is associated with a Page Table Entry (typically 64-bits in size for 64-bit address spaces). The page table entries are stored in the page table and the information is cached in the TLBs.
A PTE can have its validity bit cleared to indicate that the page associated with the virtual address is not in memory, but it is the operating system software that handles such. (Some specifics of how the TLB handles invalid PTEs depend on the choices made by the developers of the architecture, but obviously any non-speculative access must generate an exception so that the OS can load the appropriate page and set the valid bit of the PTE.)
> Did I... Get it right? I really hope so, this is going in
> a completely different direction than I thought it would.
It looks like you have got it mostly right.
As with any area of knowledge, the more one learns, the more one discovers how much one does not know (and how interconnected knowledge is).
(In my reading about computer architecture, I have concentrated somewhat on caches and TLBs because these areas seem to be somewhat more accessible--lack of circuit design or programming knowledge is not a major issue for reading most of the academic papers--, the way they impact performance is relatively straightforward, and they have a significant impact on performance and power efficiency.)
[snip]
> - The TLB lies in the store components of a core. (I know you said one TLB per CPU... but then
> you said that the TLB lies in the store components, so you must've meant core, correct?)
To get an idea of where TLBs are physically located, you can take a look at CPU floorplans (doing a web image search for "CPU core floorplan" should provide some interesting viewing). Hans de Vries has a number of such images (and some articles with analysis). The following is probably a helpful example: http://www.chip-architect.com/news/K8L_floorplan.jpg
> - There is not multiple TLBs (like I thought) unless they are tertiary TLBs
> to the primary TLB, or a microTLB to help sort out the primary TLB.
TLBs are not associated with levels of the (ordinary) cache, but like cache can be divided into multiple levels and be shared or private between instruction and data accesses (and even between different cores).
> - The TLB is just a cache of the last acceses to information; their translations and permissions.
Yep! The "authoritative" source for such information is the page table(s) and TLBs are caches of page table information. (Intel calls the structure which stores internal node entries in its multi-level page table Paging-Structure Caches, so Intel might view a TLB as only holding Page Table Entries and not any Page Directory Entries and other parts of the hierarchical page table. Since Intel has not implemented a structure which caches both types of page table components, it is not obvious what Intel would call such a hardware structure. I seem to recall Robert Wessel [on the comp.arch newsgroup, I think] mentioning some time ago that some implementation of IBM mainframes did have TLBs that cached more than just PTEs--and called them TLBs.)
> - NO caches have a TLB solely for them; though all caches have a page table or multiple in them
The L1 TLBs are somewhat tightly bound to the L1 caches in typical processors (that have TLBs) to allow minimal latency in L1 cache accesses. However, there is typically not a special TLB for the L2 cache nor another for the L3 cache.
Page tables are conceptually stored in main memory, but like other parts of main memory can be found in caches. (Theoretically, pages of a page table can even be paged out to swap like other memory.)
In a typical system, each software process has a separate virtual address space requiring a separate page table. (Threads within a process share address space/page table.) The OS typically takes a portion of each process' total address space as a virtual memory region shared across the system. Mechanisms exist to to avoid duplicating all of the information in this region (e.g., the ARM ISA defines a global address space with a mask determining what accesses are within the per-process address space and which are in the global address space). (Eep! Excess digression!)
> - The TLB uses these page tables to track what is inside a cache/RAM and
> uses the table to translate virtual addresses to physical addresses.
The TLBs are (typically) not concerned with what is present in the caches and only provide information to translate virtual addresses to physical address and some other metadata like permissions, accessed and dirty bits, cache write-through/write-back behavior, etc. Each virtual address page (typically an aligned chunk of memory about 4KiB in size) is associated with a Page Table Entry (typically 64-bits in size for 64-bit address spaces). The page table entries are stored in the page table and the information is cached in the TLBs.
A PTE can have its validity bit cleared to indicate that the page associated with the virtual address is not in memory, but it is the operating system software that handles such. (Some specifics of how the TLB handles invalid PTEs depend on the choices made by the developers of the architecture, but obviously any non-speculative access must generate an exception so that the OS can load the appropriate page and set the valid bit of the PTE.)
> Did I... Get it right? I really hope so, this is going in
> a completely different direction than I thought it would.
It looks like you have got it mostly right.
As with any area of knowledge, the more one learns, the more one discovers how much one does not know (and how interconnected knowledge is).
(In my reading about computer architecture, I have concentrated somewhat on caches and TLBs because these areas seem to be somewhat more accessible--lack of circuit design or programming knowledge is not a major issue for reading most of the academic papers--, the way they impact performance is relatively straightforward, and they have a significant impact on performance and power efficiency.)