A Network of Coprocessors
Cavium’s system on a chip features five major coprocessors, which are all represented by semi-cryptic acronyms. Figure 5 below shows the coprocessors picked out in darker blue and orange.
Figure 4 – Cavium Coprocessors
OCTEON hosts two programmable packet processors, show in light blue in Figure 4. These packet processors are mainly intended to handle IPv4 and v6 traffic. They include L2-L4 parsing, exception checking and several DMA engines. Once the packet header has been stripped and stored, the payload can be forwarded through the system for processing; there is also a bypass mechanism for packets that do not require special attention from any part of the system. The peak performance of the packet processors is 30M packets/sec.
A ZIP coprocessor boosts performance when handling compressed data. This is a very natural sort of hardware acceleration; the sheer ubiquity of compressed web content, emails, attachments and files means that software routines would slow the simple cnMIPS cores to a crawl. The compression/decompression processor implements the GZIP and PKZIP algorithms, and can process data at 4Gb/s. It can use either dynamic or static Huffman encoding and also processes Adler32 checksums for the data.
Since security is the hallmark of Cavium Networks, it should come as no surprise that the OCTEON also contains hardware for that purpose. As described previously, each cnMIPS core has its own cryptographic accelerator giving OCTEON up to 16 crypto accelerators capable of 16K RSA operations per second. The orange “Secure Vault” block serves two major functions. It contains a true random number generator which has 340Mb/s of throughput. It also provides secure storage for encryption and decryption keys. Together, these functions address most, if not all, commonly used security techniques.
The deterministic finite automata (DFA) processor, which is labeled as regular expression engine in Figure 4, contains 16 engines that perform pattern matching. Together, the DFA processors can operate at up to 4Gb/s. These are fairly general purpose, and some uses include checking for a known virus signature, security intrusions, or key phrases associated with spam (i.e. Nigerian bank accounts). They could also be used to check for digital watermarks or other authentication data. Source filtering would be another popular application, so packets that appear to be part of a denial of service attack could be rejected; similarly, email or files could be denied or approved based on the trustworthiness of the source. Lastly, the DFAs can also be used for Quality of Service (QoS), giving priority to interactive data, such as video or sound packets, over email and asynchronous data. The DFAs require their own storage to hold the patterns they are searching for, hence the additional DRAM controller as discussed in the prior section.
Lastly, there is the memory allocation unit (Malloc), the timers and the scheduling and synchronization unit. The memory allocator is used by both hardware and software, and can create unlimited size free lists. There are 16 timers, which work in conjunction with the scheduling unit, which is the central point of control for the system. The scheduling unit creates and manages offload queues, and partitions work between the different cores. The scheduling block is also responsible for synchronizing work and ordering the results. By using dedicated hardware for scheduling functions, Cavium is able to increase utilization for their multicore architecture. A system without such schedulers would likely require one or more cores to act as a load balancer, which would reduce overall throughput.