By: rwessel (robertwessel.delete@this.yahoo.com), October 15, 2007 9:44 pm
Room: Moderated Discussions
Interesting stuff:
http://www2.hursley.ibm.com/decimal/IBM-z6-mainframe-microprocessor-Webb.pdf
Highlights:
- “z6” CPU is not a Power6, but shares significant technology and elements, including parts of caches, interconnects and functional units and overall pipeline design – “Siblings, not identical twins”
- Executes 668 of 894 architected zSeries instructions in hardware, the rest trap to millicode, as do existing zSeries CPUs. Includes 50 new zSeries instructions. New support for 1MB page frames (weird – 2MB would fit the current page tables much better).
- Quad core, “mostly in-order design,” shades of P6. z990 and z9 were fairly traditional OoO dual cores.
- Instructions are issued in groups of unspecified size.
- 4Ghz and 15 FO4, vs. 1.7GHz and 28FO4
- 14 stage pipeline (similar to P6), contrasting with z9 at approximately 6. In order fast design obviously impacts some things, but many operations seem to come out well ahead: fix point operation forwarding loop in pipeline is 1 cycle in both z9 and z6, but nets 15 vs. 28FO4 in new z6. Load-load latency increase from 3 to 4, but z6 FO4 times are 60 vs 84 for z9. Mispredicted branches take a big hit, as expected for an in-order design – 13+ cycles vs. 6+ for z9 (195+ FO4 vs. 168+).
- 434mm**2 die, 991M transistors, 8765 pin “package”
- 64KB L1I, 128KB L1D and 3MB private L2 per core, off chip L3 (24MB on hub chip – 48MB on two – apparent build option). Directory based coherence.
- “Aggressive branch prediction”
- Multi-level TLB as on z9
- I/Os: 4x48GB/s interprocessor, 4x13GB/s memory, 2x17GP. The interprocessor and memory interface look to be the same as P6’s, and the I/O may be as well.
- Traditional decimal math now routed through decimal FP unit. DFP is now fully supported in hardware (z9 was mostly millicode), and the DFP unit is largely identical to the P6’s.
- Compression and encryption accelerators – shared by pairs of cores.
- Traditional zSeries RAS features in addition to the P6 stuff, including a checklpoint buffer to periodically record the full architected CPU state to allow retry or state migration in the even of failure. “Over 20,000 error checkers in chip”
- New focus on energy efficiency
All in all, this looks like the most radically different zSeries core in decades.
http://www2.hursley.ibm.com/decimal/IBM-z6-mainframe-microprocessor-Webb.pdf
Highlights:
- “z6” CPU is not a Power6, but shares significant technology and elements, including parts of caches, interconnects and functional units and overall pipeline design – “Siblings, not identical twins”
- Executes 668 of 894 architected zSeries instructions in hardware, the rest trap to millicode, as do existing zSeries CPUs. Includes 50 new zSeries instructions. New support for 1MB page frames (weird – 2MB would fit the current page tables much better).
- Quad core, “mostly in-order design,” shades of P6. z990 and z9 were fairly traditional OoO dual cores.
- Instructions are issued in groups of unspecified size.
- 4Ghz and 15 FO4, vs. 1.7GHz and 28FO4
- 14 stage pipeline (similar to P6), contrasting with z9 at approximately 6. In order fast design obviously impacts some things, but many operations seem to come out well ahead: fix point operation forwarding loop in pipeline is 1 cycle in both z9 and z6, but nets 15 vs. 28FO4 in new z6. Load-load latency increase from 3 to 4, but z6 FO4 times are 60 vs 84 for z9. Mispredicted branches take a big hit, as expected for an in-order design – 13+ cycles vs. 6+ for z9 (195+ FO4 vs. 168+).
- 434mm**2 die, 991M transistors, 8765 pin “package”
- 64KB L1I, 128KB L1D and 3MB private L2 per core, off chip L3 (24MB on hub chip – 48MB on two – apparent build option). Directory based coherence.
- “Aggressive branch prediction”
- Multi-level TLB as on z9
- I/Os: 4x48GB/s interprocessor, 4x13GB/s memory, 2x17GP. The interprocessor and memory interface look to be the same as P6’s, and the I/O may be as well.
- Traditional decimal math now routed through decimal FP unit. DFP is now fully supported in hardware (z9 was mostly millicode), and the DFP unit is largely identical to the P6’s.
- Compression and encryption accelerators – shared by pairs of cores.
- Traditional zSeries RAS features in addition to the P6 stuff, including a checklpoint buffer to periodically record the full architected CPU state to allow retry or state migration in the even of failure. “Over 20,000 error checkers in chip”
- New focus on energy efficiency
All in all, this looks like the most radically different zSeries core in decades.