By: , July 30, 2013 5:27 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 30, 2013 3:59 pm wrote:
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on July 30, 2013 1:18 pm wrote:
> [snip]
> > So page tables are stored in memory and there are TLBs to manage those tables currently being used it seems.
>
> The term "Translation Lookaside Buffer" is used in a more technical sense as only referring to the
> cache of translation and permission information with the term "Memory Management Unit" including
> the TLB and (in some cases) hardware to load TLB entries from the in-memory page table (or a software
> TLB)--this is called a hardware page table walker--and to update "accessed" and "dirty" bits in
> the Page Table Entry in memory (again not all architectures handle such in hardware).
>
> In modern high performance processors, memory accesses by the hardware page table walker are cached,
> so such accesses do not necessarily have to read or write the actual main memory directly. This data
> may be cached in specialized structures (which Intel calls Paging-Structure Caches) and/or in the caches
> uses by processor memory accesses. (Avoiding the use of L1 caches by the hardware page table walker
> reduces contention with the processing core for its more timing critical L1 cache resources.)
>
> Software (the operating system or hypervisor) fills the page table with data. Typically
> software will also clear "accessed" bits occasionally to provide a measure of how recently
> (and frequently) a page of memory is used. (This information can be used to choose a good
> victim if a page needs to be swapped out.) Software may also clear "dirty" bits if
>
> > Just to clarify, this explanation is only for the TLB for the main memory, right?
>
> I am not certain what you mean by "the TLB for the main memory". The TLB is a cache of the page table(s).
> A page table provides translations (and permissions) for a virtual address space; caching this information
> reduces the cost of look-ups which would ordinarily occur for every memory access.
>
> > The L1DTLB or L3 TLB have no knowledge of whats going on in main memory, correct?
>
> TLBs are traditionally not coherent (or, more accurately, coherence is managed by software). This means
> that if a page table entry is changed by software (or by a memory management unit updating its "accessed"
> or "dirty" bit), the system's TLBs can contain stale data until the old entry is either invalidated by software
> or is naturally evicted from the TLB by other PTEs being loaded into the TLB. (In the case of "accessed" and
> "dirty" bit updating, this is not a major problem because the hardware only updates in one direction--so there
> is no way to inconsistently order the actions of different MMUs--and other than updating these bits the hardware
> does not act on this information--so at worst a few extraneous updates might occur.)
>
> (This non-coherence is becoming more of a concern with the increasing commonness of multiprocessor systems.
> Forcing every processor in the system to take an interrupt to run a software routine to invalidate a (possibly
> not present) TLB entry introduces more overhead as the number of processors increases.)
>
> > Also, if the TLB is on-die; where is it? Is it integrated into the IMC or another location?
>
> In a typical processor, a TLB access is needed for every memory access to provide translation and
> permission information (caches are typically tagged based on physical memory addresses rather than
> virtual addresses, so translation information would be needed before a cache hit or miss could be
> determined). This means that the L1 TLB tends to be tightly coupled to the L1 caches. (In some cases,
> a very small TLB--usually called a microTLB--for instruction pages is provided with a single L1 TLB
> for both instruction and data accesses. Instruction accesses have very high locality of reference,
> so even a two-entry microTLB can greatly reduce access contention for a shared L1 TLB.)
>
> L2 TLBs are often less tightly connected to the processing core (Itanium implementations being something of
> an exception; these access the L2 TLBs for every store and for all floating-point memory operations.). A more
> typical L2 TLB is only accessed on a miss in the L1 TLB, so its connection to the core is more indirect. (Note
> that an processor could be designed to use L2 TLBs to provide translations for prefetch engines.)
>
> There are many variations on how (primarily non-L1) TLB resources can be shared. Some implementations
> provide separate instruction and data L2 TLBs while others use a unified L2 TLB. TLB sharing across
> multiple cores have been proposed. (This is very similar to the sharing considerations for ordinary
> caches. Sharing reduces the commonness of underutilized resources but increases contention for those
> resources and reduces optimization opportunities from tighter binding or specialized use.)
>
> (Unlike ordinary caches, TLBs also have issues of different page sizes. This introduces another
> area where separate vs. shared trade-offs must be considered. For multi-level page tables
> and linear page tables, the caching of page table node entries is another concern that ordinary
> caches do not need to consider with respect to sharing vs. specializing.)
>
> This subject can get very complex, but the basic principles are fairly accessible (if
> the presenter does not confuse the reader with digressions and extraneous detail!).
Thanks again for your answer! I find it funny how this is basic to you; definitely impressive as I'm trying very hard to wrap my head around it, and the more I uncover, it seems like I'm taking one step forward and two steps back. There's just so much to learn. Thanks again.
So accessing, reading, and writing to the TLB is done by the table walker, correct? The store/load units of the core itself do not interact directly with the TLB (in this context, meaning, the table of translations and permissions) itself, correct?
By "TLB for the main memory", I am referring to the TLB for the RAM. I refer to RAM by the common term "memory" and to the caches as "caches." Sorry for the confusion. Though what I meant by the question was; do multiple page tables on apply to the main memory (RAM) due to the sheer size of it; or do all caches (L1,L2,L3,etc.) have multiple page tables? I would think that either L1 or L2 cache would be too small to make use of multiple page tables.
By the "The L1DTLB or L3 TLB have no knowledge of whats going on in main memory, correct?" question, I meant; the L1DTLB or L1DTLB page walker will never have a situation where either of those two units themselves will access the main memory (RAM), correct?
I dont believe I understood the answer to my last question. By "location", I meant that if the L1DTLB is geographically close to the L1 Dcache, and the L2 TLB is geographically close to the L2 cache, and since the TLB for the main memory (RAM) CANNOT be geographically close to the RAM as it has to be on-die; where is it located?
Thank you again for your informative answers!
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on July 30, 2013 1:18 pm wrote:
> [snip]
> > So page tables are stored in memory and there are TLBs to manage those tables currently being used it seems.
>
> The term "Translation Lookaside Buffer" is used in a more technical sense as only referring to the
> cache of translation and permission information with the term "Memory Management Unit" including
> the TLB and (in some cases) hardware to load TLB entries from the in-memory page table (or a software
> TLB)--this is called a hardware page table walker--and to update "accessed" and "dirty" bits in
> the Page Table Entry in memory (again not all architectures handle such in hardware).
>
> In modern high performance processors, memory accesses by the hardware page table walker are cached,
> so such accesses do not necessarily have to read or write the actual main memory directly. This data
> may be cached in specialized structures (which Intel calls Paging-Structure Caches) and/or in the caches
> uses by processor memory accesses. (Avoiding the use of L1 caches by the hardware page table walker
> reduces contention with the processing core for its more timing critical L1 cache resources.)
>
> Software (the operating system or hypervisor) fills the page table with data. Typically
> software will also clear "accessed" bits occasionally to provide a measure of how recently
> (and frequently) a page of memory is used. (This information can be used to choose a good
> victim if a page needs to be swapped out.) Software may also clear "dirty" bits if
>
> > Just to clarify, this explanation is only for the TLB for the main memory, right?
>
> I am not certain what you mean by "the TLB for the main memory". The TLB is a cache of the page table(s).
> A page table provides translations (and permissions) for a virtual address space; caching this information
> reduces the cost of look-ups which would ordinarily occur for every memory access.
>
> > The L1DTLB or L3 TLB have no knowledge of whats going on in main memory, correct?
>
> TLBs are traditionally not coherent (or, more accurately, coherence is managed by software). This means
> that if a page table entry is changed by software (or by a memory management unit updating its "accessed"
> or "dirty" bit), the system's TLBs can contain stale data until the old entry is either invalidated by software
> or is naturally evicted from the TLB by other PTEs being loaded into the TLB. (In the case of "accessed" and
> "dirty" bit updating, this is not a major problem because the hardware only updates in one direction--so there
> is no way to inconsistently order the actions of different MMUs--and other than updating these bits the hardware
> does not act on this information--so at worst a few extraneous updates might occur.)
>
> (This non-coherence is becoming more of a concern with the increasing commonness of multiprocessor systems.
> Forcing every processor in the system to take an interrupt to run a software routine to invalidate a (possibly
> not present) TLB entry introduces more overhead as the number of processors increases.)
>
> > Also, if the TLB is on-die; where is it? Is it integrated into the IMC or another location?
>
> In a typical processor, a TLB access is needed for every memory access to provide translation and
> permission information (caches are typically tagged based on physical memory addresses rather than
> virtual addresses, so translation information would be needed before a cache hit or miss could be
> determined). This means that the L1 TLB tends to be tightly coupled to the L1 caches. (In some cases,
> a very small TLB--usually called a microTLB--for instruction pages is provided with a single L1 TLB
> for both instruction and data accesses. Instruction accesses have very high locality of reference,
> so even a two-entry microTLB can greatly reduce access contention for a shared L1 TLB.)
>
> L2 TLBs are often less tightly connected to the processing core (Itanium implementations being something of
> an exception; these access the L2 TLBs for every store and for all floating-point memory operations.). A more
> typical L2 TLB is only accessed on a miss in the L1 TLB, so its connection to the core is more indirect. (Note
> that an processor could be designed to use L2 TLBs to provide translations for prefetch engines.)
>
> There are many variations on how (primarily non-L1) TLB resources can be shared. Some implementations
> provide separate instruction and data L2 TLBs while others use a unified L2 TLB. TLB sharing across
> multiple cores have been proposed. (This is very similar to the sharing considerations for ordinary
> caches. Sharing reduces the commonness of underutilized resources but increases contention for those
> resources and reduces optimization opportunities from tighter binding or specialized use.)
>
> (Unlike ordinary caches, TLBs also have issues of different page sizes. This introduces another
> area where separate vs. shared trade-offs must be considered. For multi-level page tables
> and linear page tables, the caching of page table node entries is another concern that ordinary
> caches do not need to consider with respect to sharing vs. specializing.)
>
> This subject can get very complex, but the basic principles are fairly accessible (if
> the presenter does not confuse the reader with digressions and extraneous detail!).
Thanks again for your answer! I find it funny how this is basic to you; definitely impressive as I'm trying very hard to wrap my head around it, and the more I uncover, it seems like I'm taking one step forward and two steps back. There's just so much to learn. Thanks again.
So accessing, reading, and writing to the TLB is done by the table walker, correct? The store/load units of the core itself do not interact directly with the TLB (in this context, meaning, the table of translations and permissions) itself, correct?
By "TLB for the main memory", I am referring to the TLB for the RAM. I refer to RAM by the common term "memory" and to the caches as "caches." Sorry for the confusion. Though what I meant by the question was; do multiple page tables on apply to the main memory (RAM) due to the sheer size of it; or do all caches (L1,L2,L3,etc.) have multiple page tables? I would think that either L1 or L2 cache would be too small to make use of multiple page tables.
By the "The L1DTLB or L3 TLB have no knowledge of whats going on in main memory, correct?" question, I meant; the L1DTLB or L1DTLB page walker will never have a situation where either of those two units themselves will access the main memory (RAM), correct?
I dont believe I understood the answer to my last question. By "location", I meant that if the L1DTLB is geographically close to the L1 Dcache, and the L2 TLB is geographically close to the L2 cache, and since the TLB for the main memory (RAM) CANNOT be geographically close to the RAM as it has to be on-die; where is it located?
Thank you again for your informative answers!