By: Seni (seniike.delete@this.hotmail.com), April 9, 2017 6:19 am
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on April 9, 2017 5:50 am wrote:
> Seni (seniike.delete@this.hotmail.com) on April 9, 2017 4:06 am wrote:
>
> > The reason this did not apply for RISC is that separate instruction caches make it not
> > really a Von-Neumann architecture. There are two memory buses in parallel, one for fetches
> > and one for loads & stores, so single-cycle is possible even for load and store.
>
> I agree with your whole analysis, except for the point that the early ARM and RISC/SPARC
> didn't have cache at all, so they had to cope with data accesses and I-fetch sharing a
> single memory bus. SPARC's approach was many registers plus register windows, so that
> accesses to temporary values, critical local variables, and procedure args were mostly
> going to registers rather than main memory; ARM's approach was a moderately large register
> file, plus a carefully designed memory interface and ISA which could optimize sequential
> vs non-sequential accesses and interleave (mostly-sequential) I-fetch and (non-sequential,
> except for LDM/STM) data accesses in a way which maintained very high memory bandwidth
> to the (relatively new) nibble-mode DRAM memory system.
>
> I believe MIPS needed caches and separate I- and D-buses (Harvard architecture) from the
> start. Which positioned it for somewhat higher-performance but higher-cost systems
> than ARM and SPARC.
>
> [The Sun-4/110 was interesting in that it still didn't have a cache, but it exploited
> the fast-access-within-an-open-page behavior of page-mode DRAM to give cache-like
> performance - in particular fast I-fetch for code running a tight loop without
> branching out of the open DRAM-pages or doing random data accesses].
I'm not an expert on early ARM, so I googled it, and found this:
and this:
> Seni (seniike.delete@this.hotmail.com) on April 9, 2017 4:06 am wrote:
>
> > The reason this did not apply for RISC is that separate instruction caches make it not
> > really a Von-Neumann architecture. There are two memory buses in parallel, one for fetches
> > and one for loads & stores, so single-cycle is possible even for load and store.
>
> I agree with your whole analysis, except for the point that the early ARM and RISC/SPARC
> didn't have cache at all, so they had to cope with data accesses and I-fetch sharing a
> single memory bus. SPARC's approach was many registers plus register windows, so that
> accesses to temporary values, critical local variables, and procedure args were mostly
> going to registers rather than main memory; ARM's approach was a moderately large register
> file, plus a carefully designed memory interface and ISA which could optimize sequential
> vs non-sequential accesses and interleave (mostly-sequential) I-fetch and (non-sequential,
> except for LDM/STM) data accesses in a way which maintained very high memory bandwidth
> to the (relatively new) nibble-mode DRAM memory system.
>
> I believe MIPS needed caches and separate I- and D-buses (Harvard architecture) from the
> start. Which positioned it for somewhat higher-performance but higher-cost systems
> than ARM and SPARC.
>
> [The Sun-4/110 was interesting in that it still didn't have a cache, but it exploited
> the fast-access-within-an-open-page behavior of page-mode DRAM to give cache-like
> performance - in particular fast I-fetch for code running a tight loop without
> branching out of the open DRAM-pages or doing random data accesses].
I'm not an expert on early ARM, so I googled it, and found this:
Reverse engineering the ARM1 processor's microinstructions
This article looks at how the ARM1 processor executes instructions. Unexpectedly, the ARM1 uses microcode, executing multiple microinstructions for each instruction. This microcode is stored in the instruction decode PLA, shown below. RISC processors generally don't use microcode, so I was surprised to find microcode at the heart of the ARM1. Unlike most microcoded processors, the microcode in the ARM1 is only a small part of the control circuitry.
and this:
Microcode in RISC?
Everyone "knows" that RISC processors don't use microcode.[3] So does the ARM1 have "real microcode"?
One of the ARM1 architects explains microcode: "A microcode address is formed from some or all of the contents of the instruction register, together with some state values which are internal to the micro-control unit. This address is decoded to drive a unique row of a matrix, the columns of which are the control signals for the datapath."[4] This description is a perfect fit for how the ARM1's control works, so it seems reasonable to consider the ARM1 to have microcode.
I think it's easiest to understand the ARM1's control logic by viewing it as microcode. However, there are couple reasons to consider it not "real microcode". One reason is that the ARM1 microcode is only a small part of the chip's control, as you can see in the die photo and floorplan earlier. The control signals are heavily modified by the instruction skip component and conditionals are handled by the conditional unit. This goes beyond vertical microcode, where logic expands the microcode's control signals; in the ARM1, this other circuitry can entirely override the control signals. In addition, the ARM1 uses separate circuitry (the priority encoder) to control the block data transfer instructions; the microcode just sits in a loop. (The ARM2 is similar with multiplication — a separate circuit controls multiplication.)
The ARM1's microcode is an order of magnitude smaller than other microcoded processors. The ARM1's microcode has a 42×36 microcode, for 1512 bits in total. The 8086 used a 504×21 microcode (over 10,000 bits) while the 68000 has a 544×17 microcode and 366×68 nanocode (over 34,000 bits).
Probably the biggest objection to calling the ARM1 microcoded is that the designers of the ARM chip didn't consider it that way.[4] Furber mentions that some commercial RISC processors use microcode, but doesn't apply that term to the ARM1. He describes ARM1's instruction decode as two-level structure. In the first level, the instruction decoder PLA differentiates instructions into classes with similar characteristics. The secondary decoding uses the information from the first level along with hardware to cope with all the possible operations. The first level is described as providing "broad hints" about which functions to choose, and the second level fills in the details with bits from the instruction.