Virtual Address Space
The most prominent and visible aspect of ARMv8 and A64 is extending the virtual addressing, but there are many other improvements to the memory architecture. Currently, AArch64 features two 48-bit virtual address spaces, one for the kernel and one for applications. Application addressing starts at 0 and grows upwards, while kernel space grows down from 264; any references to unmapped addresses in between will trigger a fault. Pointers are sign extended to 64-bits, and can optionally be configured to use the upper 8-bits for tagging pointers with additional information.
The translation tables for each virtual address space are mapped using either traditional 4KB pages or a new larger 64KB page. The minimum page size determines which page table format will be used. The 64KB pages can improve performance, at the cost of substantially increasing memory fragmentation and utilization.
For virtual address spaces with 4KB pages, a 4 level table is used with 9 bits translated per lookup. In this case, 64KB pages will help index more data, but not reduce the number of look ups. When using 64KB pages, each lookup provides 13 address bits and only 3 levels are necessary. In fact, if addresses smaller than 42-bits are used, the 64KB page tables only need two lookups for translation.
Addressing and Memory Instructions
Like all RISCs, ARM is a strict load/store instruction set that separates memory accesses from arithmetic. ARMv7 and A32 have a single relatively nice indexed addressing mode with optional pre- and post-incrementing. A base register is added to a scaled (i.e. shifted) offset (either another register or a immediate). Optionally, the offset pre- or post-updates the base register, which is useful for handling loops. Since the PC is an ordinary register in AArch32, the indexed addressing mode can be used for PC-relative addressing as well. Since ARMv6, unaligned memory accesses have been supported for single loads or stores.
A64 addressing is generally similar in terms of capabilities, but has been adapted to simplify address calculation. There are two separate addressing modes, the familiar indexed mode and a new PC-relative mode, since the PC cannot be accessed like a regular register. As with ARMv7, unaligned accesses are allowed, but have a performance penalty.
The indexed addressing is still robust, but the incrementing is limited to simplify the critical path in address generation. The 64-bit base register is added to a scaled offset. The offset can be an immediate, a 64-bit register or a sign-extended 32-bit register. However, pre-incrementing is only available with unscaled immediate offsets. Any load or store can post-increment with an unscaled immediate, but only SIMD loads and stores can use post-increment with a register offset. The immediates are generally limited to 9-bits signed. However, for base plus offset a scaled 12-bit unsigned immediate is available.
A new literal addressing mode is used to calculate PC-relative addresses when accessing at least 32-bits of data. Literal addressing replaces the base register with the PC and adds a 19-bit signed offset, giving a relatively limited range of +/-1MB. While this preserves some PC-relative capabilities, it is significantly less flexible than addressing in AArch32.
ARMv7 included several instructions that could access multiple memory locations. In particular, load multiple and pop can read all of the registers from memory, while store multiple and push can write all the registers to memory. These instructions must be micro-coded to handled mis-speculation, exceptions and interrupts and are a potential source of complexity. AArch64 eliminates all multiple memory access instructions to simplify the microarchitecture at the cost of instruction density. To accelerate multiple accesses, two new instructions, load pair and store pair have been added.
Load and store pair access a pair of independent registers from adjacent memory locations with unaligned support. However, the addressing modes are somewhat more limited than normal accesses. Specifically, the pair access instructions can only use the base register plus a scaled 7-bit signed immediate, with optional pre- and post-increment. The pair instructions are clever, since only a single address calculation is needed, saving a little power. The pair instructions are also available as non-temporal accesses, although with only base plus immediate addressing.
Discuss (188 comments)