By: matthew (willy.delete@this.infradead.org), August 7, 2021 9:21 am
Room: Moderated Discussions
Intel seem to have changed how their TLBs work recently. My i7-1165G7 reports:
TLB info
L1 Instruction TLB: 4KB pages, 8-way associative, 128 entries
L1 Instruction TLB: 4MB/2MB pages, 8-way associative, 16 entries
L1 Store Only TLB: 1GB/4MB/2MB/4KB pages, fully associative, 16 entries
L1 Load Only TLB: 4KB pages, 4-way associative, 64 entries
L1 Load Only TLB: 4MB/2MB pages, 4-way associative, 32 entries
L1 Load Only TLB: 1GB pages, fully associative, 8 entries
L2 Unified TLB: 4MB/2MB/4KB pages, 8-way associative, 1024 entries
From the Intel SDM:
00001b: Data TLB.
00010b: Instruction TLB.
00011b: Unified TLB*.
00100b: Load Only TLB. Hit on loads; fills on both loads and stores.
00101b: Store Only TLB. Hit on stores; fill on stores.
* Some unified TLBs will allow a single TLB entry to satisfy data read/write and instruction fetches. Others will require separate entries (e.g., one loaded on data read/write and another loaded on an instruction fetch)
Why split the Data TLB into Load-only and Store-only? Is it a way to increase the number of TLB entries without paying the full cost of increasing the number of entries? Is there a speed advantage to having a TLB that is only hit on stores? Or is there something else going on?
(I'm a software person; I have no experience in hardware design)
TLB info
L1 Instruction TLB: 4KB pages, 8-way associative, 128 entries
L1 Instruction TLB: 4MB/2MB pages, 8-way associative, 16 entries
L1 Store Only TLB: 1GB/4MB/2MB/4KB pages, fully associative, 16 entries
L1 Load Only TLB: 4KB pages, 4-way associative, 64 entries
L1 Load Only TLB: 4MB/2MB pages, 4-way associative, 32 entries
L1 Load Only TLB: 1GB pages, fully associative, 8 entries
L2 Unified TLB: 4MB/2MB/4KB pages, 8-way associative, 1024 entries
From the Intel SDM:
00001b: Data TLB.
00010b: Instruction TLB.
00011b: Unified TLB*.
00100b: Load Only TLB. Hit on loads; fills on both loads and stores.
00101b: Store Only TLB. Hit on stores; fill on stores.
* Some unified TLBs will allow a single TLB entry to satisfy data read/write and instruction fetches. Others will require separate entries (e.g., one loaded on data read/write and another loaded on an instruction fetch)
Why split the Data TLB into Load-only and Store-only? Is it a way to increase the number of TLB entries without paying the full cost of increasing the number of entries? Is there a speed advantage to having a TLB that is only hit on stores? Or is there something else going on?
(I'm a software person; I have no experience in hardware design)
Topic | Posted By | Date |
---|---|---|
Split TLBs | matthew | 2021/08/07 09:21 AM |
Split TLBs | anon | 2021/08/07 10:12 AM |
Split TLBs | Mark Roulo | 2021/08/07 03:16 PM |
Split TLBs | anon2 | 2021/08/07 08:13 PM |
Performance and power!! | David Kanter | 2021/08/09 09:35 AM |
Performance and power!! | anon | 2021/08/10 12:31 AM |
Performance and power!! | rwessel | 2021/08/10 04:43 AM |
Performance and power!! | Michael S | 2021/08/10 06:40 AM |
Performance and power!! | Mark Roulo | 2021/08/10 06:46 AM |
Performance and power!! | Michael S | 2021/08/11 12:33 AM |
Performance and power!! | rwessel | 2021/08/11 03:44 AM |
Performance and power!! | Michael S | 2021/08/11 04:22 AM |
Performance and power!! | rwessel | 2021/08/11 05:17 AM |
FPGA SRAM blocks | Ungo | 2021/08/12 10:50 AM |
Performance and power!! | David Kanter | 2021/08/10 09:31 AM |