By: Brett (ggtgp.delete@this.yahoo.com), November 28, 2014 10:42 am
Room: Moderated Discussions
Peter Lund (peterfirefly.delete@this.gmail.com) on November 27, 2014 1:10 pm wrote:
> rwessel (robertwessel.delete@this.yahoo.com) on November 24, 2014 11:48 pm wrote:
> > Ronald Maas (rmaas.delete@this.wiwo.nl) on November 24, 2014 7:13 pm wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on November 23, 2014 11:24 am wrote:
> > > > Apart of disadvantages, both 68K and VAX shared one advantage over x86 - 2-byte granularity of
> > > > instructions. P6-style brute-force approach to parsing and early decoding would take relatively
> > > > less hardware resources. I don't believe that it could have helped VAX, but it could make 3-way
> > > > 68K feasible even in transistor budget that does not allow decent decoded instruction cache.
> > > >
> > > >
> > >
> > > A huge benefit of the x86 instruction encoding scheme is
> > > that it allows determination of the instruction length
> > > by inspecting only the first 1, 2 or 3 bytes of the instruction
> > > (not counting any prefixes). The only exception
> > > is when some length-changing prefixes are used such as
> > > address size prefix and operand size prefix. When the
> > > processor encounters these prefixes in the instruction streams, it is not able to decode these instructions
> > > in a single cycle anymore. Search for LCP in the Intel Optimization Reference Manual http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
> > >
> > > With 68K and VAX often the whole instruction must be parsed in order to determine its length. Which would
> > > significantly increase the technical complexity required for building a superscalar instruction decoder.
> >
> >
> > It is astonishing, at least in hindsight, just how badly Motorola screwed up the 68K ISA with
> > the '020. While the groundwork for that had been laid in the original 68K (the unused bits used
> > in the '020 for the additional addressing modes were in the extension words), the 68000 and '010
> > had all instructions lengths determined in the first (16-bit) word of the instruction.
>
> Even that was pretty bad. I did the experiment of trying to find a small PLA that would give me the
> length + a valid bit for each of the 2^16 values for the first word. The size was not small...
>
> Of course I could have broken the problem down into several smaller PLAs +
> a big, final one. Maybe it would have been reduced to reasonable size.
>
> The '020 was just insane. Why add more stupid addressing modes that nobody uses!? I thought
> the 68K was supposed to be based on real data from Len Shustak's PhD thesis on the PDP-11?
The 020 indirect addressing was insane for high end, but for embedded this gave you better code density (a few percent) and nice easy to use assembly macros to access various tables. Lots of assemble code back then, and an indirect load was going to be just as slow in two instructions as in one, before pipelining was implemented. (And even after for short windows?)
The split register file also saved instruction bits, the 68k line always had the best instruction density. (Benchmarks written in C do not always show this, C compilers do not like split register files and may not have used the indirect instructions well.)
This is not to say I did not scream and call Motorola idiots when I read the 020 manual the first time, but I do understand the motivation behind this stupidity.
Motorola cared about the volume part of the market, the high end workstation and desktop were afterthoughts that were never part of the plan, and the plan was clearly never updated to match those high end sales Motorola never expected.
My biggest gripe is that 32 bit offsets were never implemented, the instruction bits were clearly saved for this. 16 bit offsets are not enough and cause C compilers to generate poor code, hurting code density. I heard this was a complier issue, if given 32 bit offsets that is all it will use, which hurts code density even more than just using 16 bits. ColdFire dropped to three supported instruction sizes, so I don't think ColdFire fixed this weakness.
Motorola understood hardware, but not software. Hardware skill enabled Motorola to dominate mobile phones for a while, then when software mattered Motorola crashed and burned.
Motorola had been shutting down and spinning off divisions for a decade, which I put down to management incompetence, but it may have been software was taking over in all of those markets.
> I don't think everybody understood how good branch predictors were/could become. If they are good enough,
> you can afford longer pipelines, which in turn means that you can afford more decode stages.
> rwessel (robertwessel.delete@this.yahoo.com) on November 24, 2014 11:48 pm wrote:
> > Ronald Maas (rmaas.delete@this.wiwo.nl) on November 24, 2014 7:13 pm wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on November 23, 2014 11:24 am wrote:
> > > > Apart of disadvantages, both 68K and VAX shared one advantage over x86 - 2-byte granularity of
> > > > instructions. P6-style brute-force approach to parsing and early decoding would take relatively
> > > > less hardware resources. I don't believe that it could have helped VAX, but it could make 3-way
> > > > 68K feasible even in transistor budget that does not allow decent decoded instruction cache.
> > > >
> > > >
> > >
> > > A huge benefit of the x86 instruction encoding scheme is
> > > that it allows determination of the instruction length
> > > by inspecting only the first 1, 2 or 3 bytes of the instruction
> > > (not counting any prefixes). The only exception
> > > is when some length-changing prefixes are used such as
> > > address size prefix and operand size prefix. When the
> > > processor encounters these prefixes in the instruction streams, it is not able to decode these instructions
> > > in a single cycle anymore. Search for LCP in the Intel Optimization Reference Manual http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
> > >
> > > With 68K and VAX often the whole instruction must be parsed in order to determine its length. Which would
> > > significantly increase the technical complexity required for building a superscalar instruction decoder.
> >
> >
> > It is astonishing, at least in hindsight, just how badly Motorola screwed up the 68K ISA with
> > the '020. While the groundwork for that had been laid in the original 68K (the unused bits used
> > in the '020 for the additional addressing modes were in the extension words), the 68000 and '010
> > had all instructions lengths determined in the first (16-bit) word of the instruction.
>
> Even that was pretty bad. I did the experiment of trying to find a small PLA that would give me the
> length + a valid bit for each of the 2^16 values for the first word. The size was not small...
>
> Of course I could have broken the problem down into several smaller PLAs +
> a big, final one. Maybe it would have been reduced to reasonable size.
>
> The '020 was just insane. Why add more stupid addressing modes that nobody uses!? I thought
> the 68K was supposed to be based on real data from Len Shustak's PhD thesis on the PDP-11?
The 020 indirect addressing was insane for high end, but for embedded this gave you better code density (a few percent) and nice easy to use assembly macros to access various tables. Lots of assemble code back then, and an indirect load was going to be just as slow in two instructions as in one, before pipelining was implemented. (And even after for short windows?)
The split register file also saved instruction bits, the 68k line always had the best instruction density. (Benchmarks written in C do not always show this, C compilers do not like split register files and may not have used the indirect instructions well.)
This is not to say I did not scream and call Motorola idiots when I read the 020 manual the first time, but I do understand the motivation behind this stupidity.
Motorola cared about the volume part of the market, the high end workstation and desktop were afterthoughts that were never part of the plan, and the plan was clearly never updated to match those high end sales Motorola never expected.
My biggest gripe is that 32 bit offsets were never implemented, the instruction bits were clearly saved for this. 16 bit offsets are not enough and cause C compilers to generate poor code, hurting code density. I heard this was a complier issue, if given 32 bit offsets that is all it will use, which hurts code density even more than just using 16 bits. ColdFire dropped to three supported instruction sizes, so I don't think ColdFire fixed this weakness.
Motorola understood hardware, but not software. Hardware skill enabled Motorola to dominate mobile phones for a while, then when software mattered Motorola crashed and burned.
Motorola had been shutting down and spinning off divisions for a decade, which I put down to management incompetence, but it may have been software was taking over in all of those markets.
> I don't think everybody understood how good branch predictors were/could become. If they are good enough,
> you can afford longer pipelines, which in turn means that you can afford more decode stages.