ENVOI I/O System and Offload Engines
The ENVOI I/O and offload engines are the most remarkable and unique features for the PWRficient family. Most systems use I/O in a fixed arrangement; a system might support 4 PCIe lanes, and 2 Serial ATA connections, but cannot be re-arranged to support 6 PCIe or 2 PCIe and 2 SATA. The PWRficient family takes advantage of the increased die space at 65nm to implement an extremely configurable I/O system, shown below in Figure 5.
Figure 5 – ENVOI I/O System (modified from a P.A. Semi presentation)
The I/O system features 24 SERDES lanes to communicate with the outside world; each one runs at up to 2.5Gbps for a maximum of 104Gbps or 13GB/s. The SERDES lanes are configured at boot time through the BIOS or boot ROM, and may be used for PCI Express lanes, 10 Gigabit Ethernet or Gigabit Ethernet. A single PCIe lane maps to exactly one SERDES lane, the 10 Gigabit Ethernet requires 4 SERDES lanes, and the Gigabit Ethernet requires a single SERDES, but leaves bandwidth on the table. The I/O system has support for 8 PCIe connections, which can be up to 16 lanes each and supports partitioning. There are up to two 10 Gigabit Ethernet MACs, and 4 Gigabit Ethernet MACs, and they all support IPv4, v6 and packet filtering. Other protocols are currently under consideration for future inclusion, such as RapidI/O, SATA, Fibre Channel, Serial Attached SCSI just to name a few.
While including the hardware for three different I/O standards does consume extra die space if they are not used, it gives the product much more flexibility. This sort of trade-off is most likely a good one, given that the transistor budget for each generation of devices doubles. Reconfigurability certainly reduces the total number of different chips needed (no more mixing and matching of SATA, PCIe, etc. controllers), and it will be interesting to see if other vendors follow suit; this will be a trend to watch for in the future.
The ENVOI I/O system also features a coherent, fully associative 8KB cache for internal use. The I/O cache can prefetch, store descriptors and combine writes. Additionally, the L2 cache can also be used for I/O caching. This might seem a little peculiar, but these features (combined with the ability to access the crossbar) lead to higher performance and lower power consumption. In general, reading data from a nearby cache is the quickest and most efficient access method; according to Mark Hayter, a cache access uses 40x less power than going to DRAMs.
The last elements of the ENVOI I/O system are the DMA and offload engines. The DMA engine supports 64 receive channels, 20 transmit and has a 24KB buffer. It is capable of scatter-gather operations and transactions between any combination of memory and I/O (i.e. memory to memory, memory to I/O, etc.). All together, it can sustain 32GB/s of bandwidth, which is a nice match for the interconnect capacity. The offload engine is a bit more interesting. It was designed to support a variety of functions, including encryption and decryption, CRC, checksums and the XORing function for RAID 5. The encryption support is fairly comprehensive, including AES, 3DES and several other block ciphers, and cryptographic hash functions such as MD5, SHA-1, SHA-256 and a few more. Perhaps most importantly, it also supports SSL and IPSec acceleration. The CRC engine is a generalized one that should be usable for almost any CRC algorithm, be it TCP/IP or the protocol of the week. The CRC engine is another example of how, when given the choice between a simpler but less general piece of logic, the P.A. Semi designers tended to opt for a more powerful and general implementation, even if it is more complex.
System Management Interfaces
The PA6T-1682M also has the regular set of management ports and features. The power and system timer are used for the dynamic clocking and voltage regulation, among other management and timing functions. During start up, the system uses the boot bus to access the BIOS, which is stored on either a ROM, or various types of flash memory. External I/O is also provided through UART (i.e. serial) ports and the ubiquitous SMBus interface. Lastly, there are two built-in debugging facilities. The two trace memory blocks shown in the PA6T-1682M are used to capture traces of transactions across CONEXIUM (the TTM) or traces of I/O operations (PTM). These can be used to diagnose failures and to examine performance characteristics.