Article: ARM Goes 64-bit
By: name99 (name99.delete@this.redheron.com), November 17, 2012 2:31 pm
Room: Moderated Discussions
anonymou5 (no.delete@this.spam.com) on August 14, 2012 11:03 am wrote:
> > Who uses x86 4MB pages?
>
> The use of PAE=0 is limited to scenarios in which PAE is not supported but PSE is (think
> historic x86 processors), scenarios in which PA space is less critical than the footprint
> of the page tables (the 32-bit paging entries limit 4 KiB pages to 32-bit PAs, and 4 MiB
> pages to 41-bit PAs, and they don't leave room for NX bits -- however, the tables consume
> less space), and/or scenarios in which page table walk performance matters (PDPTRs + two
> levels for PSE versus four levels for today's PAE).
I always assumed x86 large pages were used in essentially the same way BATs are used on POWER. They're not meant to be a general solution; they're a way to prevent the TLB being polluted in one particular case where this is easily avoided, namely where one has a
- large chunk of
- contiguous address space which is
- all used in the same way and which
- doesn't need to be swapped (or, unlikely, can be swapped in these large chunks)
SO two obvious use case: one might use large pages/BATs for
- covering video address space (including not just the frame buffer)
- the instruction and data footprints of the wired part of the OS.
An additional use case that sometimes arises, for dedicated machines is covering (some?) code and (some?) data that one essentially wants wired down and in constant use.
This could be the web server on some machines; the database server on some machines, perhaps even in special cases a shmem block of memory used for communication.
The point is, in ALL these cases, one knows ahead of the time what one is trying to do and why. For these uses cases, large pages work, and don't pollute the TLB. But it is fighting their design principle to assume that they are just large 4kB pages and can be handled the same way, in particular that they can usefully be AUTOMATICALLY managed by the OS --- automatically allocated, automatically swapped, automatically aggregated and de-aggregated from smaller pages.
As for 64kB pages --- bring it on. High time, and well done, ARM. Really the only thing I'd like to see in the new ISA that I don't see is
- support for multiple condition codes, like POWER
- a dedicated count register for speeding up inner loops, again like POWER.
I'm not a compiler guy, but for low level assembly hacking, multiple condition registers could be used with value. It always struck me that the compiler guys were not even trying to use them, with at least two obvious possibilities not explored:
- storing bools in conditions registers rather than standard "integer" registers
- testing conditions (when it makes sense) across basic block boundaries, rather than the paradigm of only looking within a single basic block for scheduling opportunities.
As for the dedicated count register, yes, you can get the same effect mostly with branch prediction, but this allows you to free up some of those limited branch prediction slots for more useful purposes, and gives you a register that lives close to the instruction side of the chip, not the data side, and thus a little faster when used for instruction purposes --- witness POWER's use of that register to call proc ptrs. I don't know --- maybe that sort of lack of orthogonality is just too much hassle? But count register never struck me as a bad idea, unlike some of what's in the POWER ISA.
> > Who uses x86 4MB pages?
>
> The use of PAE=0 is limited to scenarios in which PAE is not supported but PSE is (think
> historic x86 processors), scenarios in which PA space is less critical than the footprint
> of the page tables (the 32-bit paging entries limit 4 KiB pages to 32-bit PAs, and 4 MiB
> pages to 41-bit PAs, and they don't leave room for NX bits -- however, the tables consume
> less space), and/or scenarios in which page table walk performance matters (PDPTRs + two
> levels for PSE versus four levels for today's PAE).
I always assumed x86 large pages were used in essentially the same way BATs are used on POWER. They're not meant to be a general solution; they're a way to prevent the TLB being polluted in one particular case where this is easily avoided, namely where one has a
- large chunk of
- contiguous address space which is
- all used in the same way and which
- doesn't need to be swapped (or, unlikely, can be swapped in these large chunks)
SO two obvious use case: one might use large pages/BATs for
- covering video address space (including not just the frame buffer)
- the instruction and data footprints of the wired part of the OS.
An additional use case that sometimes arises, for dedicated machines is covering (some?) code and (some?) data that one essentially wants wired down and in constant use.
This could be the web server on some machines; the database server on some machines, perhaps even in special cases a shmem block of memory used for communication.
The point is, in ALL these cases, one knows ahead of the time what one is trying to do and why. For these uses cases, large pages work, and don't pollute the TLB. But it is fighting their design principle to assume that they are just large 4kB pages and can be handled the same way, in particular that they can usefully be AUTOMATICALLY managed by the OS --- automatically allocated, automatically swapped, automatically aggregated and de-aggregated from smaller pages.
As for 64kB pages --- bring it on. High time, and well done, ARM. Really the only thing I'd like to see in the new ISA that I don't see is
- support for multiple condition codes, like POWER
- a dedicated count register for speeding up inner loops, again like POWER.
I'm not a compiler guy, but for low level assembly hacking, multiple condition registers could be used with value. It always struck me that the compiler guys were not even trying to use them, with at least two obvious possibilities not explored:
- storing bools in conditions registers rather than standard "integer" registers
- testing conditions (when it makes sense) across basic block boundaries, rather than the paradigm of only looking within a single basic block for scheduling opportunities.
As for the dedicated count register, yes, you can get the same effect mostly with branch prediction, but this allows you to free up some of those limited branch prediction slots for more useful purposes, and gives you a register that lives close to the instruction side of the chip, not the data side, and thus a little faster when used for instruction purposes --- witness POWER's use of that register to call proc ptrs. I don't know --- maybe that sort of lack of orthogonality is just too much hassle? But count register never struck me as a bad idea, unlike some of what's in the POWER ISA.
Topic | Posted By | Date |
---|---|---|
New Article: ARM Goes 64-bit | David Kanter | 2012/08/13 11:04 PM |
New Article: ARM Goes 64-bit | none | 2012/08/13 11:44 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 12:04 AM |
MIPS MT-ASE | Paul A. Clayton | 2012/08/14 08:01 AM |
MONITOR/MWAIT | EduardoS | 2012/08/14 09:08 AM |
MWAIT not specifically MT | Paul A. Clayton | 2012/08/14 09:36 AM |
MWAIT not specifically MT | EduardoS | 2012/08/15 02:16 PM |
MONITOR/MWAIT | anonymou5 | 2012/08/14 10:07 AM |
MONITOR/MWAIT | EduardoS | 2012/08/15 02:20 PM |
MIPS MT-ASE | rwessel | 2012/08/14 09:14 AM |
New Article: ARM Goes 64-bit | SHK | 2012/08/14 01:01 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 01:37 AM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 02:57 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 03:29 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 03:44 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 04:28 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 04:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 05:06 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 04:40 AM |
AArch64 select better than cmov | Paul A. Clayton | 2012/08/14 05:08 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:12 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 05:25 AM |
Predicated ld/store are useful | Paul A. Clayton | 2012/08/14 05:48 AM |
Predicated ld/store are useful | none | 2012/08/14 05:56 AM |
Predicated ld/store are useful | anon | 2012/08/14 06:07 AM |
Predicated stores might not be that bad | Paul A. Clayton | 2012/08/14 06:27 AM |
Predicated stores might not be that bad | David Kanter | 2012/08/15 12:14 AM |
Predicated stores might not be that bad | Michael S | 2012/08/15 10:41 AM |
Predicated stores might not be that bad | R Byron | 2012/08/17 03:09 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:54 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 06:04 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:43 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 05:07 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:20 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 05:29 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:00 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/14 02:43 PM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 05:53 AM |
OT: Conrad's "Youth" | Richard Cownie | 2012/08/14 06:20 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 05:04 AM |
New Article: ARM Goes 64-bit | mpx | 2012/08/14 07:59 AM |
New Article: ARM Goes 64-bit | Antti-Ville Tuunainen | 2012/08/14 08:16 AM |
New Article: ARM Goes 64-bit | anonymou5 | 2012/08/14 10:03 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 02:31 PM |
Microarchitecting a counter register | Paul A. Clayton | 2012/11/17 06:37 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/08/14 03:21 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 02:40 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/17 03:52 PM |
New Article: ARM Goes 64-bit | Doug S | 2012/11/17 04:48 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/18 04:40 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 06:59 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 07:23 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 08:31 AM |
Downloading µarch-specific binaries? | Paul A. Clayton | 2012/11/19 10:21 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 10:41 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 06:44 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | VMguy | 2012/11/22 02:21 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | David Kanter | 2012/11/22 11:12 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | Gabriele Svelto | 2012/11/23 02:50 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/23 09:09 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/26 12:24 AM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/26 02:33 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/27 10:17 PM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/28 01:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/26 11:16 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/27 11:33 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 04:53 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 05:15 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 06:33 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 08:16 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 08:53 AM |
New Article: ARM Goes 64-bit | Eugene Nalimov | 2012/11/28 04:58 PM |
Amazing! | EduardoS | 2012/11/28 06:25 PM |
Amazing! (non-italic response) | EduardoS | 2012/11/28 06:25 PM |
Amazing! | EBFE | 2012/11/28 07:20 PM |
Undefined behaviour doubles down | EduardoS | 2012/11/28 08:10 PM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/28 06:54 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 08:21 PM |
Have you heard of Transmeta? | David Kanter | 2012/11/19 02:47 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/19 08:08 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/19 02:40 PM |
Semantic Dictionary Encoding | Ray | 2012/11/19 09:37 PM |
New Article: ARM Goes 64-bit | Rohit | 2012/11/20 03:48 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/20 10:07 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 05:41 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/21 09:12 AM |
A JIT example | Mark Roulo | 2012/11/21 09:30 AM |
A JIT example | Wilco | 2012/11/21 06:04 PM |
A JIT example | rwessel | 2012/11/21 08:05 PM |
A JIT example | Gabriele Svelto | 2012/11/23 02:53 AM |
A JIT example | EduardoS | 2012/11/23 09:13 AM |
A JIT example | Wilco | 2012/11/23 12:41 PM |
A JIT example | EduardoS | 2012/11/23 01:06 PM |
A JIT example | Gabriele Svelto | 2012/11/23 03:09 PM |
A JIT example | Symmetry | 2012/11/26 04:58 AM |
New Article: ARM Goes 64-bit | Ray | 2012/11/19 09:27 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 08:11 AM |
v7-M is Thumb-only | Paul A. Clayton | 2012/08/14 05:58 AM |
Minor suggested correction | Paul A. Clayton | 2012/08/14 07:33 AM |
Minor suggested correction | anon | 2012/08/14 07:57 AM |
New Article: ARM Goes 64-bit | Exophase | 2012/08/14 07:33 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 08:16 AM |
New Article: ARM Goes 64-bit | jigal | 2012/08/15 12:49 PM |
Correction re ARM and BBC Micro | Paul | 2012/08/14 07:59 PM |
Correction re ARM and BBC Micro | Per Hesselgren | 2012/08/15 02:27 AM |
Memory BW so low | Per Hesselgren | 2012/08/15 02:14 AM |
Memory BW so low | none | 2012/08/15 10:16 AM |
New Article: ARM Goes 64-bit | dado | 2012/08/15 09:25 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/16 01:35 PM |
Number of GPRs | Exophase | 2012/08/16 01:52 PM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 01:41 AM |
Ooops, missing link... | Kenneth Jonsson | 2012/08/17 01:44 AM |
64-bit pointers eat some performance | Paul A. Clayton | 2012/08/17 05:19 AM |
64-bit pointers eat some performance | bakaneko | 2012/08/17 07:37 AM |
Brute force seems to work | Paul A. Clayton | 2012/08/17 09:08 AM |
Brute force seems to work | bakaneko | 2012/08/17 10:15 AM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/17 07:46 AM |
Pointer compression is atypical | Paul A. Clayton | 2012/08/17 09:43 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 11:57 AM |
Pointer compression is atypical | Howard Chu | 2012/08/22 09:17 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 03:48 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 05:51 AM |
Pointer compression is atypical | Wilco | 2012/08/17 01:41 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 03:13 PM |
Pointer compression is atypical | Ricardo B | 2012/08/19 09:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/22 09:08 PM |
Unified libraries? | Paul A. Clayton | 2012/08/23 06:49 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 07:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 04:17 PM |
Pointer compression is atypical | anon | 2012/08/23 07:15 PM |
Pointer compression is atypical | Howard Chu | 2012/08/23 08:33 PM |
64-bit pointers eat some performance | Foo_ | 2012/08/18 11:09 AM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 04:25 PM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 04:32 PM |
Page-related benefit of small pointers | Paul A. Clayton | 2012/08/23 07:36 AM |
Number of GPRs | Wilco | 2012/08/17 05:31 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 10:54 AM |
Number of GPRs | Exophase | 2012/08/17 11:44 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 12:22 PM |
Number of GPRs | Wilco | 2012/08/17 01:53 PM |
What about dynamic utilization? | Exophase | 2012/08/17 08:30 AM |
Compiler vs. assembly aliasing knowledge? | Paul A. Clayton | 2012/08/17 09:20 AM |
Compiler vs. assembly aliasing knowledge? | Exophase | 2012/08/17 10:09 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/18 01:23 AM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 10:02 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 05:07 PM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 06:26 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 09:03 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/20 12:59 AM |
Number of GPRs | David Kanter | 2012/08/17 11:46 AM |
RAT issues as part of reason 1 | Paul A. Clayton | 2012/08/17 01:18 PM |
Number of GPRs | name99 | 2012/11/17 05:37 PM |
Large ARFs increase renaming cost | Paul A. Clayton | 2012/11/17 08:23 PM |
Number of GPRs | David Kanter | 2012/08/16 02:31 PM |
Number of GPRs | Richard Cownie | 2012/08/16 04:17 PM |
32 GPRs ~2-3% | Paul A. Clayton | 2012/08/16 05:27 PM |
Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT) | Paul A. Clayton | 2012/08/16 05:29 PM |
32 GPRs ~2-3% | Exophase | 2012/08/16 09:06 PM |
R31 as SP/zero is kind of neat (NT) | Paul A. Clayton | 2012/08/17 05:23 AM |
32 GPRs ~2-3% | rwessel | 2012/08/17 07:24 AM |
32 GPRs ~2-3% | Exophase | 2012/08/17 08:16 AM |
32 GPRs ~2-3% | Max | 2012/08/17 03:19 PM |
32 GPRs ~2-3% | name99 | 2012/11/17 06:43 PM |
Number of GPRs | mpx | 2012/08/17 12:11 AM |
Latency and power | Paul A. Clayton | 2012/08/17 05:54 AM |
Number of GPRs | bakaneko | 2012/08/17 02:09 AM |
New Article: ARM Goes 64-bit | Steve | 2012/08/17 01:12 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/19 11:42 AM |
New Article: ARM Goes 64-bit | Doug S | 2012/08/19 01:02 PM |
New Article: ARM Goes 64-bit | Anon | 2012/08/19 06:16 PM |
New Article: ARM Goes 64-bit | Steve | 2012/08/30 06:51 AM |
Scalar vs Vector registers | Robert David Graham | 2012/08/19 04:19 PM |
Scalar vs Vector registers | David Kanter | 2012/08/19 04:29 PM |
New Article: ARM Goes 64-bit | Baserock ARM servers | 2012/08/21 03:13 PM |
Baserock ARM servers | Sysanon | 2012/08/21 03:14 PM |
A-15 virtualization and LPAE? | Paul A. Clayton | 2012/08/21 05:13 PM |
A-15 virtualization and LPAE? | Anon | 2012/08/21 06:13 PM |
Half-depth advantages? | Paul A. Clayton | 2012/08/21 07:42 PM |
Half-depth advantages? | Anon | 2012/08/22 02:33 PM |
Thanks for the information (NT) | Paul A. Clayton | 2012/08/22 03:04 PM |
A-15 virtualization and LPAE? | C. Ladisch | 2012/08/23 10:12 AM |
A-15 virtualization and LPAE? | Paul | 2012/08/23 02:17 PM |
Excessive pessimism | Paul A. Clayton | 2012/08/23 03:08 PM |
Excessive pessimism | David Kanter | 2012/08/23 04:05 PM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/22 06:12 AM |
BTW, Baserock==product, Codethink==company (NT) | Paul A. Clayton | 2012/08/22 07:56 AM |
New Article: ARM Goes 64-bit | Reinoud Zandijk | 2012/08/21 10:27 PM |