By: rwessel (robertwessel.delete@this.yahoo.com), August 20, 2013 12:08 am
Room: Moderated Discussions
Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on August 19, 2013 3:28 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on August 17, 2013 11:18 am wrote:
> > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on August 16, 2013 8:43 pm wrote:
> > [snip]
> > > Thank you very much for your reply; sorry again for replying so late,
> > > this week has certainly been... Eventful for me to say the least.
> >
> > I cannot justly complain about late replies when I very often reply with considerable
> > delay (or abandon replying after having delayed "too long").
> >
> > > Nonetheless, your reply is... Not to be offensive; but it goes way over my
> > > head. I just don't know where to "dig in" to start making sense of it. Maybe
> > > it's because I dont understand the makeup of all the bits in an address.
> >
> > No offense taken. I skipped over a significant amount of explanation for the sake of brevity.
> >
> > > However, I did see the mention of page tables; but then there seems to be a mention of
> > > a table of values for the page table which is a table of values itself? It seems a bit
> > > redundant to me and I guess that's where my understanding starts dropping off.
> >
> > Each process (protection domain) has a separate virtual address space. The page table is an indexable
> > data structure in memory that provides translations of each (valid) page of the virtual address space
> > to a page in the physical address space (as well as permission information and for some systems information
> > about cache behavior and other metadata). You seem to understand that far.
> >
> > An obvious structure to use for such would be an array. In order to find a translation one would
> > only need to know the page number (the virtual address divided by the page size [with the fractional
> > part ignored]), the physical address of the base of the array, and the size of each entry in
> > the array. The address of the base of the array would be placed in a special purpose register
> > so that the lookup can proceed without having to get this information from the OS.
> >
> > The special purpose register is just a piece of storage associated
> > with the processor that is only used for holding the base address.
> >
> > In a system with 32-bit physical and virtual addresses and
> > 4KiB pages, a 32-bit array entry (Page Table Entry)
> > would be sufficiently large to hold the translation and the addition information. The translation would use
> > 20 of the 32 bits (32-bit physical address minus 12 bits that are the offset within the page [and so the
> > virtual and physical address bits match]). The remaining 12 bits would typically include a valid bit (V),
> > permission bits for read and write accesses (SR, SW) under supervisor privilege (in which the OS typically
> > runs) and for read and write accesses (UR, UW) under user
> > privilege, and bits to track that the page has been
> > accessed (A) (to facilitate page replacement choices of the OS) and modified (D for dirty) with remaining
> > bits for other information (e.g., execute permissions,
> > writeback caching), reserved for future hardware use,
> > or reserved for OS use (in which case hardware is guaranteed to ignore those bits). E.g.:
> >
> >
> >
> > Using an array for such a page table would make a look up just taking the most
> > significant 20 bits of the virtual address (the page number) multiplying by four
> > (the size in bytes of each PTE) and adding the base address of the array.
> >
> > However, such page tables would be very large (4 MiB per process) and most of the memory would be taken
> > by PTEs with the valid bit clear (not a valid/used page).
> > One common method used to make the memory requirements
> > for a page table smaller is to use a tree-like data structure. With a 32-bit virtual address space and
> > 4KiB pages, it is somewhat obvious to have each tree node be 4KiB in size; with 4 byte PTEs, each node
> > will have 1024 entries and use 10 bits to index. This allows the use of a two-level tree.
> >
> > This image from Wikipedia (for the 32-bit x86 page table lookup) may help illustrate how translation occurs
> > in such a design (CR3 is the special purpose register holding the address of the base of the tree.).
> >
> >
> >
> > The entries in the first tree node (sometimes called a page directory) are typically similar to
> > PTEs in format, holding the physical address of the lower level node associated with each entry, a
> > valid bit, and other bits. An unset valid bit in one of these entries (sometimes called Page Directory
> > Entries [PDEs]) allows the entire node to which it would be pointing to be unused. So by adding a
> > level of indirection in the lookup large unused chunks (4 MiB in size) of virtual address space can
> > require only 4 bytes in the page table instead of 4 KiB. (Only one such chunk of the virtual address
> > space needs to exist for the memory use overhead [4 KiB for the first node] to be covered.)
> >
> > > What is this "special register" you refer to? I was told previously that the page tables are
> > > kept in the main memory. Is this special register sortof like a cache for the page table?
> >
> > The special purpose register is a register in the processor which is set by the OS to point to the base
> > of the page table. Each time the process/address space being used by the processor is changed, the OS puts
> > a new value in this register. (In a sense, this could be viewed as a one-entry cache managed by the OS
> > with the "tag" being an address space number which always matches the address space number of the currently
> > running process. However, that way of looking at it is probably not helpful at this point.)
> >
> > > I apologize; I'm sure you put time into your reply but I just don't understand it...
> > > Thanks again for your help as always however! Any and all help is appreciated!
> >
> > I currently have the free time, so no big deal. Explaining
> > such matters is in some sense easier than developing
> > more original content, but it probably also helps teach
> > me how to better present such matters in an accessible
> > manner. (I suspect the above still leaves much that might not be clearly explained.)
>
> Thanks, this reply definitely clears up quite a few loose ends!
>
> - So to my understanding, this "special register" simply lists the physical address of the base
> of the array of the process currently being worked on. Correct? (I assume the physical address of
> the base would be needed so that things can be "scaled" from the beginning of the array up?)
Typically the special register (CR3 on x86, for example) simply contains the physical address of the root of the page table structure. On x86 that's the first entry of the page directory. In the illustration above (for x86 in 32-bit, non-PAE mode), the page directory contains 1024, four byte, entries. Each of those entries can be marked invalid, in which case the 4MB region they would map is not valid, or is marked valid, in which case the PDE points to a page table, which also has 1024, four byte, entries, each one valid or invalid. Valid ones contain a translation to a physical page address, the invalid ones don't.
> - You definitely gave quite an informative chunk about page tables; but I don't believe I figured
> out the answer to one of my questions: If the LSU compares the TLB entries to the first line of the
> L1 cache and does NOT find a match; where does it go from there? Does it go to main memory and use
> the page tables to locate the needed information, or does it scale from L1 to L2 to L3, etc.?
The LSU does not compare TLB entries to anything. After the TLB is used to translate a virtual address to a physical address, the physical address is used to look up the requested cache line in the L1 cache. If it's not there, the (same) physical address is used to check the L2, then the L3, L4 (if present), etc., and ultimately main memory. After the translation happens (once), it doesn't happen again.
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on August 17, 2013 11:18 am wrote:
> > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on August 16, 2013 8:43 pm wrote:
> > [snip]
> > > Thank you very much for your reply; sorry again for replying so late,
> > > this week has certainly been... Eventful for me to say the least.
> >
> > I cannot justly complain about late replies when I very often reply with considerable
> > delay (or abandon replying after having delayed "too long").
> >
> > > Nonetheless, your reply is... Not to be offensive; but it goes way over my
> > > head. I just don't know where to "dig in" to start making sense of it. Maybe
> > > it's because I dont understand the makeup of all the bits in an address.
> >
> > No offense taken. I skipped over a significant amount of explanation for the sake of brevity.
> >
> > > However, I did see the mention of page tables; but then there seems to be a mention of
> > > a table of values for the page table which is a table of values itself? It seems a bit
> > > redundant to me and I guess that's where my understanding starts dropping off.
> >
> > Each process (protection domain) has a separate virtual address space. The page table is an indexable
> > data structure in memory that provides translations of each (valid) page of the virtual address space
> > to a page in the physical address space (as well as permission information and for some systems information
> > about cache behavior and other metadata). You seem to understand that far.
> >
> > An obvious structure to use for such would be an array. In order to find a translation one would
> > only need to know the page number (the virtual address divided by the page size [with the fractional
> > part ignored]), the physical address of the base of the array, and the size of each entry in
> > the array. The address of the base of the array would be placed in a special purpose register
> > so that the lookup can proceed without having to get this information from the OS.
> >
> > The special purpose register is just a piece of storage associated
> > with the processor that is only used for holding the base address.
> >
> > In a system with 32-bit physical and virtual addresses and
> > 4KiB pages, a 32-bit array entry (Page Table Entry)
> > would be sufficiently large to hold the translation and the addition information. The translation would use
> > 20 of the 32 bits (32-bit physical address minus 12 bits that are the offset within the page [and so the
> > virtual and physical address bits match]). The remaining 12 bits would typically include a valid bit (V),
> > permission bits for read and write accesses (SR, SW) under supervisor privilege (in which the OS typically
> > runs) and for read and write accesses (UR, UW) under user
> > privilege, and bits to track that the page has been
> > accessed (A) (to facilitate page replacement choices of the OS) and modified (D for dirty) with remaining
> > bits for other information (e.g., execute permissions,
> > writeback caching), reserved for future hardware use,
> > or reserved for OS use (in which case hardware is guaranteed to ignore those bits). E.g.:
> >
> >
> > | 31 | 30:11 | 10 | 9 | 8 | 7 | 6 | 5 | 4:0 |
> > | V | Physical address of page | SR | SW | UR | UW | A | D | reserved |
> >
> >
> > Using an array for such a page table would make a look up just taking the most
> > significant 20 bits of the virtual address (the page number) multiplying by four
> > (the size in bytes of each PTE) and adding the base address of the array.
> >
> > However, such page tables would be very large (4 MiB per process) and most of the memory would be taken
> > by PTEs with the valid bit clear (not a valid/used page).
> > One common method used to make the memory requirements
> > for a page table smaller is to use a tree-like data structure. With a 32-bit virtual address space and
> > 4KiB pages, it is somewhat obvious to have each tree node be 4KiB in size; with 4 byte PTEs, each node
> > will have 1024 entries and use 10 bits to index. This allows the use of a two-level tree.
> >
> > This image from Wikipedia (for the 32-bit x86 page table lookup) may help illustrate how translation occurs
> > in such a design (CR3 is the special purpose register holding the address of the base of the tree.).
> >
> >

> >
> > The entries in the first tree node (sometimes called a page directory) are typically similar to
> > PTEs in format, holding the physical address of the lower level node associated with each entry, a
> > valid bit, and other bits. An unset valid bit in one of these entries (sometimes called Page Directory
> > Entries [PDEs]) allows the entire node to which it would be pointing to be unused. So by adding a
> > level of indirection in the lookup large unused chunks (4 MiB in size) of virtual address space can
> > require only 4 bytes in the page table instead of 4 KiB. (Only one such chunk of the virtual address
> > space needs to exist for the memory use overhead [4 KiB for the first node] to be covered.)
> >
> > > What is this "special register" you refer to? I was told previously that the page tables are
> > > kept in the main memory. Is this special register sortof like a cache for the page table?
> >
> > The special purpose register is a register in the processor which is set by the OS to point to the base
> > of the page table. Each time the process/address space being used by the processor is changed, the OS puts
> > a new value in this register. (In a sense, this could be viewed as a one-entry cache managed by the OS
> > with the "tag" being an address space number which always matches the address space number of the currently
> > running process. However, that way of looking at it is probably not helpful at this point.)
> >
> > > I apologize; I'm sure you put time into your reply but I just don't understand it...
> > > Thanks again for your help as always however! Any and all help is appreciated!
> >
> > I currently have the free time, so no big deal. Explaining
> > such matters is in some sense easier than developing
> > more original content, but it probably also helps teach
> > me how to better present such matters in an accessible
> > manner. (I suspect the above still leaves much that might not be clearly explained.)
>
> Thanks, this reply definitely clears up quite a few loose ends!
>
> - So to my understanding, this "special register" simply lists the physical address of the base
> of the array of the process currently being worked on. Correct? (I assume the physical address of
> the base would be needed so that things can be "scaled" from the beginning of the array up?)
Typically the special register (CR3 on x86, for example) simply contains the physical address of the root of the page table structure. On x86 that's the first entry of the page directory. In the illustration above (for x86 in 32-bit, non-PAE mode), the page directory contains 1024, four byte, entries. Each of those entries can be marked invalid, in which case the 4MB region they would map is not valid, or is marked valid, in which case the PDE points to a page table, which also has 1024, four byte, entries, each one valid or invalid. Valid ones contain a translation to a physical page address, the invalid ones don't.
> - You definitely gave quite an informative chunk about page tables; but I don't believe I figured
> out the answer to one of my questions: If the LSU compares the TLB entries to the first line of the
> L1 cache and does NOT find a match; where does it go from there? Does it go to main memory and use
> the page tables to locate the needed information, or does it scale from L1 to L2 to L3, etc.?
The LSU does not compare TLB entries to anything. After the TLB is used to translate a virtual address to a physical address, the physical address is used to look up the requested cache line in the L1 cache. If it's not there, the (same) physical address is used to check the L2, then the L3, L4 (if present), etc., and ultimately main memory. After the translation happens (once), it doesn't happen again.